CHAPTER 2 FACE and EYE DETECTION
2.3 Eye Position Detection
After the face region of a input image is located, we then try to extract the eye position such that we could measure eye features for applications such as drowsiness detection and so on. Many methods have been proposed to find the eye position such as using a circle-frequency filter to find a candidate “between-eyes” in [25] proposed by Kawato et al. For the general purpose to every user without database, we adopt a simple method to implement the detection for eye position.
In order to determine the eye position, the maximum width of the face on the input image must be estimated first. Then based on the fact that the eyeball is about one-fourth of the facial width, we can easily obtain the lateral eye position on the face.
The details of the process are summarized in Fig. 2.8. We consider the gray scale of an input image and then draw two vertical lines, or more if necessary, around one-fourth of the facial width. As shown in Fig. 2.8(a), one line X does not cross a
the bulb of the eye and another line X does. Fig. 2.8(b) depicts the changes in the b gray scale values along line X of Fig. 2.8(a). Likewise, the changes in the gray a scale values along line X are shown in Fig. 2.8(c). b
Observing Fig. 2.8(a), along these two lines we can find that there are two darker candidates which are expected to have lower gray-level values. As the gray-level value falls to the local minimum, it may correspond to the position where the eyebrow or eyeball possibly locates. Since the eyebrow corresponds to a lower gray value, from Figs. 2.8(b) and 2.8(c), we can infer that the valleys A1 and B are the 1 positions of eyebrow crossed by the two lines X and a X respectively. It is also b known that the eyeballs are most likely the darkest in the gray scale. It is clear that the line X does not include the pupil of the eye, but the line a X does. Consequently, b we can expect that the gray scale values corresponding to B (the second valley) in 2 Fig. 2.8(c) is smaller than those corresponding to A2 in Fig. 2.8(b).
As a result, we can extract the point B along line 2 X by selecting the b minimum gray scale values corresponding to Y axis and the minimum values must be smaller than a threshold which is about 30. Moreover, since the position of the eyebrow is about one-fourth of the facial width, we can easily search the line X . b Then we can find the eye position corresponding to the detected valley point B of a 2 suitable scanning line X . b
Xa X Xb
Y
(a)
Gray
Gray Value
A1 A2
Position of X a (b)
Fig. 2.8. Gray-level value variations along lines in (a). (b) Along X . (c) Along a
b. X
2.4 Sunglasses Image Enhancement
The presence of sunglasses will weaken the visibility of eyes, and therefore the performance of extracting eye position will drop off. The sunglasses may overlap with the eyes such that we could not separate the eyes from the detected eye regions easily.
Sunglasses with deeper color will cause more severe interference to eye detection. For accurate eye detection, it is important and essential to solve this problem by using some methods. Using an infrared camera to catch the input images may be a possible
Gray
Gray Value
B2
B1
Position of Xb
(c)
bad effect caused by sunglasses in eye detection.
2.4.1 Retinex Image Enhancement Technique
In this section, the purpose is to enhance the sunglasses region such that the eye region could be better extracted. Many image enhancement methods have been proposed. Typical methods to this end are gamma correction and gain/offset application, histogram equalization and manual histogram adjustment methods, homomorphic filtering method, and retinex image enhancement method. These methods have their different characteristics. Of these methods the retinex one is the most popular and most widely used because of its simplicity in use and its powerful ability to enhance images.
The retinex theory was first devised by Land [26]. It is sometimes also known as the Land Effect. Land's retinex theory of lightness and color constancy was one of the first computational models of an important form of perceptual constancy. Color constancy is the well known tendency for an object to always appear to have the same color, no matter what the viewing conditions are. In other words, a bright green post box appears green by daylight, by moonlight, and even under dingy street lights.
According to Land, we decide the color of something by comparing its ability to reflect short, medium and long wavelengths with that of adjoining objects. Land considered that the eye and the brain (the retina and cortex) form a single optical system, which he called the retinex.
The retinex image enhancement algorithm is an automatic image enhancement method that enhances a digital image in terms of dynamic range compression, color independence from the spectral distribution of the scene illuminant, and
color/lightness rendition. The digital images enhanced by the retinex image enhancement algorithm are much closer to the scene perceived by the human visual system, under all kinds and levels of lighting variations, than those enhanced by most other methods. A comparison of retinex with other image enhancement techniques can be found in [27].
Jobson et al. [28] defined a Single-Scale Retinex (SSR), which is an implementation of center/surround retinex. But depending on the special scale, it can either provide dynamic range compression (small scale) or tonal rendition (large scale), but not both simultaneously. Superposition of weighted different scale SSR is obvious a choice to balance these two effects. This is named Multi-Scale Retinex (MSR) [29]. The Multi-Scale Retinex (MSR) is a generalization of the Single-Scale Retinex. For color images, if the content is out of “gray world,” which means the spatial averages of three color bands are far from equal, the output will be forced to be gray by MSR. This problem could be solved by introducing weight factor for different channels in Multi-Scale Retinex with Color Restoration (MSRCR) [30]. MSRCR combines the retinex dynamic range compression and color constancy with a color restoration that provides excellent color rendition.
After MSRCR, generally the outputs will be out of the range of display. Auto gain/offset can be used to shift and compressed the histogram of MSRCR outputs to the display domain.
In this thesis, we will implement SSR, MSR, and MSRCR with gain/offset. We adjust the gain/offset parameters to adjust most of the pixels values to display domain and clap small part of the values to improve the contrast.
The retinex is a member of the class of center surround functions where each
value and the surround is a Gaussian function. The mathematical form of the
SSR process, and ﹡represents the convolution operator. F x y
( )
, is the normalized surround function defined as( )
, exp(
2 2)
2F x y =k ⎡⎣− x +y σ ⎤⎦ (2.5)
where k is a normalization factor given as
( )
Fig. 2.9 shows an original medical image and processed image by SSR with different scales of surround. The narrow and medium surround cases are self-explanatory. The wide surround case deserves some discussion because it looks a better output image.
However, the lack of dynamic range obscures the features that were visible to the observer, hence it may fail the test.
As shown in Fig.2.9, the selection of scale is related with visual angle in the direct observation. Because of the tradeoff between dynamic range compression and color rendition, we have to choose a good scale σ in the formula of F(x,y) in SSR.
However, for different images, adaptable scales are often different. If we do not want to sacrifice either dynamic range compression or color rendition, Multi-Scale Retinex,
which is a combination of weighted different scales of SSR, is a good solution. It is
where i represents the i-th spectral band, N is the number of spectral bands—N = 1 for grayscale images and N = 3 for typical color images. In the latter case, . R(x,y) is the output of the MSR process, W
i∈R, G, B
k are the weights associated with Fk, K is the number of surround functions, or scales. Fk represents the k-th surround function and is defined as:
( )
, exp(
2 2)
2F x yk =k ⎡⎣− x +y σk⎤⎦ (2.8)
where σk are the scales that control the extent of the surround function and the amount of spatial detail that is retained. Fig. 2.10 is the MSR result of previous medical examples. The MSR processed image uses features from all the three scales to provide simultaneous dynamic range and tonal rendition. It has significant dynamic range compression in the boundary between the lighted parts and dark parts, and reasonable color rendition in the whole image scale.
Actually, the suitable number of scales needed by the MSR is application dependent. However, experiments showed that three scales respectively representing narrow, medium, and wide surrounds are often enough for most of the images. The weights can be selected equal or adjusted to weight more on dynamic range compression or color rendition.
goal that we are trying to achieve. The narrow surround acts as a high-pass filter, capturing more fine details in the image but at a severe loss of tonal information. The wide surround captures more fine tonal information but at the loss of dynamic range.
The medium surround captures gets a balance between dynamic range and tonal information. The MSR is the average of the three renditions and has the characters close to the medium surround scale result.
(a) (b)
(c) (d)
Fig. 2.9. SSR with different scales. (a) Original image. (b) SSR with narrow surround scale 15. (c) SSR with medium surround scale 80. (d) SSR with wide surround scale 250.
Fig. 2.10. Result of MSR with scales = 15, 80, and 250.
Observing Fig. 2.11, we can find that the color rendition of the results of SSR and MSR have a certain degree of deviation from the original image. They look close to gray images. Actually, bad performance for color images is the weakness of SSR and MSR. MSR is good for gray images. But it could be a problem for the color images because it does not consider the relative intensity of color bands. This can be seen from formula of MSR, whose output is the relative reflectances in the special domain. Considering the images “out of gray world”, whose average intensity for three color band are far from equal, the output of MSR for the three channels will be more close, which make it looks more gray. The solution to this problem is to utilize weights for three color channels respectively depending on the relative intensity of the three channels in the original images. A color restoration factor is computed as the following form:
⎤
⎡ N
where is the color restoration coefficient in the i-th spectral band, N is the number of spectral bands, is the i-th spectral band in the input image,
(
x y Ci ,)
Ii β is gain
constant, and α controls the strength of non-linearity. Analog to the spatial operation of the retinex which utilizes a log operator, the internal form of the color restoration process and the retinex process is essentially the same. Combining Eq. (2.9) with Eq. (2.7), the MSRCR is given by
An integral scheme of MSRCR is given in Fig. 2.12. In order to observe the effect of MSRCR for color images, we apply MSRCR on an input image the same as shown in Fig. 2.11(a), the result is shown in Fig. 2.13. Comparing Fig. 2.13 with Fig. 2.11, we can easily find the result of MSRCR has better color rendition close to the original image than SSR or MSR.
When there are reflections on glasses or sunglasses, some influences on the detection for eyes will be caused. More serious reflections will cause more terrible interference whether the sunglasses region is processed by image enhancement techniques or not. As a serious reflection overlaps the eyes fortuitously, the accuracy of eye detection will drop off critically. Solving this problem is quite essential for an eye detection system. In Chapter 3, we will further discuss these conditions and attempt to utilize some methods to overcome them.
(a) (b)
(c) (d)
(e)
Fig. 2.11. (a) Original image. (b) Narrow surround (scale = 15). (c) Medium surround (scale = 80). (d) Wide surround (scale = 250). (e) MSR output with scales = 15, 80, and 250.
Fig. 2.12. Integral scheme of MSRCR.
(a) (b)
Fig. 2.13. (a) Original image. (b) MSRCR output with scales = 15, 80, and 250.
2.4.2 Histogram Equalization Enhancement Technique
In order to analyze the performance of the retinex image enhancement techniques, we compare the retinex result with another image enhancement technique:
histogram equalization technique. Histogram equalization technique is based on the idea of remapping the histogram of the image to a histogram that has a near-uniform probability density function. This will result in reassigning dark regions to brighter values and bright regions to darker values. Consequently, the histogram equalization technique deeply depends on the distribution of gray scale of input images. The probability of occurrence of gray level rk in an image is defined by
( )
k 0,1,..., 1r k
p r n k
n L
= = − (2.11)
where n is the total number of pixels in the image, is the number of pixels that have gray level , and L is the total number of possible gray levels in the image. The transformation function of Histogram equalization is given as
nk
A processed image is obtained by mapping each pixel with gray level in the input image into a corresponding pixel with gray level in the output image.
rk
sk
Histogram equalization works well for scenes that have unimodal or weakly bi-modal histograms, i.e., very dark, or very bright ones, but it is not effective to those images with strongly bi-modal histograms, i.e., images containing very dark and very
characterized by a large concentration of pixels in the dark end of the gray scale. One might think that histogram equalization would be a good approach to enhance this image, so that details in the dark regions would become more visible. However, the result in Fig. 2.14(c) shows that histogram equalization in fact did not produce a particularly good result in this case. The reason for this can be seen by studying the histogram of the equalized image shown in Fig. 2.14(d). We can see that the intensity levels have been shifted to the upper one-half of the gray scale, thus giving the image a washed-out appearance. The cause of the shift is the large concentration of dark components at or near 0 in the original histogram. The cumulative transformation function obtained from this histogram is steeply increasing, and thus mapping the large concentration of pixels in the low end of the gray scale to the high end of the scale. It should be better to re-do histogram equalization once for a bi-modal image Fig. 2.14(c), the resulting image is shown in Fig. 2.15(c), compared with the MSRCR result shown in Fig. 2.15(d). Figs. 2.16–2.19 show four examples of eyes images processed by histogram equalization and MSRCR, respectively. As we can see, the MSRCR provided the better overall visual quality. By our experience, eye region enhancement by histogram equalization does not perform consistently, i.e., sometimes good and sometimes bad. However, the MSRCR technique performs constantly well.
.
(a) (b)
(c) (d)
Fig. 2.14. Illustration of histogram equalization. (a) Original image. (b) Histogram of (a). (c) Image processed by histogram equalization. (d) Histogram of (c).
(a) (b)
(c) (d)
Fig. 2.15. A comparison of histogram equalization and the MSRCR. (a) Original image. (b) Histogram equalization result of (a). (c) Histogram equalization result of (b). (d) MSRCR result.
(a)
(b) (c)
Fig. 2.16. A comparison of histogram equalization and the MSRCR. (a) Original image. (b) Histogram equalization result. (c) MSRCR result.
(a)
(b) (c)
Fig. 2.17. A comparison of histogram equalization and the MSRCR. (a) Original
(a)
(b) (c)
Fig. 2.18. A comparison of histogram equalization and the MSRCR. (a) Original image. (b) Histogram equalization result. (c) MSRCR result.
(a)
(b) (c)
Fig. 2.19. A comparison of histogram equalization and the MSRCR. (a) Original image. (b) Histogram equalization result. (c) MSRCR result.
Chapter 3
Reflection Separation
3.1 Image with Reflection
When we view a scene through a transparent glass, the image is often similar to those shown in Fig. 3.1. Perceptually, we can view each one of these images as a superposition of two images: the foreground and the reflection. Hence each image could be decomposed into two transparent layers. We need a computer vision algorithm to find this decomposition. Mathematically, the problem can be modeled as follows. Given an image I ,
(
x y)
, we wish to find two layers, I1 and such that: I2( )
x y I( )
x y I( )
x yI , = 1 , + 2 , (3.1)
This problem is obviously difficult because there are two variables but only one equation. If we have no additional prior knowledge, there will be an infinite number of possible decompositions. In this chapter, we adopt an algorithm that can separate reflections from images using a single input image. The algorithm is based on a cost function: it favors decompositions which have a small number of edges and corners.
3.2 Cost Function, Edge and Corner
Consider a simple image which is the superposition of two squares in Fig. 3.2(a).
We want to decompose the image into two layers. There will be an infinite number of possible decompositions. Figs. 3.2(b)–(e) show some possible decompositions including the perceptually “correct” decomposition (Fig. 3.2(e)). What rule is the
“correct” decomposition based on? One reason is that it has the smallest total number of edges and corners among the decompositions shown in Fig. 3.2. The original image has ten corners: four corners from each square and two corners caused by the superposition of the two squares. When we decompose the image into two squares, the two corners caused by the superposition disappeared and there are just eight corners left. The decomposition shown in Figs. 3.2(b) and 3.2(d) increase the number of corners and edges. Clearly, we can also see that the decomposition shown in Fig.
3.2(e) has smaller total number of edges and corners than the other decompositions in Fig. 3.2, and it looks an appropriate decomposition perceptually.
How do we translate the preference for a small number of edges and corners into a cost function? We need operators used for edge and corner detectors; besides, we also need a mathematical form to give the cost of an image. Next we will describe how to define a cost function based on natural statistics of natural scenes.
A remarkably robust property of natural images that has received much attention lately is the fact that when derivative filters are applied to natural images, the filter outputs tend to be sparse [31], [32]. Fig. 3.3 can illustrate this fact. We take two arbitrary examples of natural images and apply a horizontal derivative filter to them respectively. We can see that the histograms of their derivative filter outputs are peaked at zero and fall off much faster than a Gaussian. As a result, the derivative filter outputs are concentrated at zero and therefore are sparse. Similar histograms are
observed for vertical derivative filters and for the gradient magnitude: ∇I .
(a)
(b)
(c)
(d)
(e)
(a) (b)
(c) (d)
Fig. 3.3. Two natural images and their filter derivative output diagrams. (a), (c) Natural image. (b), (d) Histogram of derivative filter outputs.
Subsequently we convert each of the filter outputs of Fig. 3.3 into a log histogram type and show the results in Fig. 3.5. Observing Fig. 3.5, we can find that the distributions are similar to an exponential density with exponent less than 1. For comparison, we show the log probability for densities of the form which is presented in [33]. We plot the corresponding log probabilities for
( )
x e xαp = −
>1
α , α =1 and α <1, respectively for . The results are shown in Fig. 3.4. Comparing Figs.
3.5(b) and 3.5(d) with Fig. 3.4, it can be found that the natural statistics for derivative filters has the qualitative nature of a distribution with
≥0 x
xα
e− α <1. Similar to derivative filters, the gradient magnitude also has the character. When we define a cost function, we will use this character of derivative filters and gradient magnitude operators. More descriptions about edge detector and gradient magnitude are in Sec.
3.2.1.
Now we consider the other operator, “corner detector.” In this paper we use a Harris-like operator c(x, y) as a corner detector. There are more detailed descriptions
Now we consider the other operator, “corner detector.” In this paper we use a Harris-like operator c(x, y) as a corner detector. There are more detailed descriptions