CHAPTER 2 FACE and EYE DETECTION
2.4 Sunglasses Images Enhancement
2.4.2 Histogram Equalization Enhancement Technique
In order to analyze the performance of the retinex image enhancement techniques, we compare the retinex result with another image enhancement technique:
histogram equalization technique. Histogram equalization technique is based on the idea of remapping the histogram of the image to a histogram that has a near-uniform probability density function. This will result in reassigning dark regions to brighter values and bright regions to darker values. Consequently, the histogram equalization technique deeply depends on the distribution of gray scale of input images. The probability of occurrence of gray level rk in an image is defined by
( )
k 0,1,..., 1r k
p r n k
n L
= = − (2.11)
where n is the total number of pixels in the image, is the number of pixels that have gray level , and L is the total number of possible gray levels in the image. The transformation function of Histogram equalization is given as
nk
A processed image is obtained by mapping each pixel with gray level in the input image into a corresponding pixel with gray level in the output image.
rk
sk
Histogram equalization works well for scenes that have unimodal or weakly bi-modal histograms, i.e., very dark, or very bright ones, but it is not effective to those images with strongly bi-modal histograms, i.e., images containing very dark and very
characterized by a large concentration of pixels in the dark end of the gray scale. One might think that histogram equalization would be a good approach to enhance this image, so that details in the dark regions would become more visible. However, the result in Fig. 2.14(c) shows that histogram equalization in fact did not produce a particularly good result in this case. The reason for this can be seen by studying the histogram of the equalized image shown in Fig. 2.14(d). We can see that the intensity levels have been shifted to the upper one-half of the gray scale, thus giving the image a washed-out appearance. The cause of the shift is the large concentration of dark components at or near 0 in the original histogram. The cumulative transformation function obtained from this histogram is steeply increasing, and thus mapping the large concentration of pixels in the low end of the gray scale to the high end of the scale. It should be better to re-do histogram equalization once for a bi-modal image Fig. 2.14(c), the resulting image is shown in Fig. 2.15(c), compared with the MSRCR result shown in Fig. 2.15(d). Figs. 2.16–2.19 show four examples of eyes images processed by histogram equalization and MSRCR, respectively. As we can see, the MSRCR provided the better overall visual quality. By our experience, eye region enhancement by histogram equalization does not perform consistently, i.e., sometimes good and sometimes bad. However, the MSRCR technique performs constantly well.
.
(a) (b)
(c) (d)
Fig. 2.14. Illustration of histogram equalization. (a) Original image. (b) Histogram of (a). (c) Image processed by histogram equalization. (d) Histogram of (c).
(a) (b)
(c) (d)
Fig. 2.15. A comparison of histogram equalization and the MSRCR. (a) Original image. (b) Histogram equalization result of (a). (c) Histogram equalization result of (b). (d) MSRCR result.
(a)
(b) (c)
Fig. 2.16. A comparison of histogram equalization and the MSRCR. (a) Original image. (b) Histogram equalization result. (c) MSRCR result.
(a)
(b) (c)
Fig. 2.17. A comparison of histogram equalization and the MSRCR. (a) Original
(a)
(b) (c)
Fig. 2.18. A comparison of histogram equalization and the MSRCR. (a) Original image. (b) Histogram equalization result. (c) MSRCR result.
(a)
(b) (c)
Fig. 2.19. A comparison of histogram equalization and the MSRCR. (a) Original image. (b) Histogram equalization result. (c) MSRCR result.
Chapter 3
Reflection Separation
3.1 Image with Reflection
When we view a scene through a transparent glass, the image is often similar to those shown in Fig. 3.1. Perceptually, we can view each one of these images as a superposition of two images: the foreground and the reflection. Hence each image could be decomposed into two transparent layers. We need a computer vision algorithm to find this decomposition. Mathematically, the problem can be modeled as follows. Given an image I ,
(
x y)
, we wish to find two layers, I1 and such that: I2( )
x y I( )
x y I( )
x yI , = 1 , + 2 , (3.1)
This problem is obviously difficult because there are two variables but only one equation. If we have no additional prior knowledge, there will be an infinite number of possible decompositions. In this chapter, we adopt an algorithm that can separate reflections from images using a single input image. The algorithm is based on a cost function: it favors decompositions which have a small number of edges and corners.
3.2 Cost Function, Edge and Corner
Consider a simple image which is the superposition of two squares in Fig. 3.2(a).
We want to decompose the image into two layers. There will be an infinite number of possible decompositions. Figs. 3.2(b)–(e) show some possible decompositions including the perceptually “correct” decomposition (Fig. 3.2(e)). What rule is the
“correct” decomposition based on? One reason is that it has the smallest total number of edges and corners among the decompositions shown in Fig. 3.2. The original image has ten corners: four corners from each square and two corners caused by the superposition of the two squares. When we decompose the image into two squares, the two corners caused by the superposition disappeared and there are just eight corners left. The decomposition shown in Figs. 3.2(b) and 3.2(d) increase the number of corners and edges. Clearly, we can also see that the decomposition shown in Fig.
3.2(e) has smaller total number of edges and corners than the other decompositions in Fig. 3.2, and it looks an appropriate decomposition perceptually.
How do we translate the preference for a small number of edges and corners into a cost function? We need operators used for edge and corner detectors; besides, we also need a mathematical form to give the cost of an image. Next we will describe how to define a cost function based on natural statistics of natural scenes.
A remarkably robust property of natural images that has received much attention lately is the fact that when derivative filters are applied to natural images, the filter outputs tend to be sparse [31], [32]. Fig. 3.3 can illustrate this fact. We take two arbitrary examples of natural images and apply a horizontal derivative filter to them respectively. We can see that the histograms of their derivative filter outputs are peaked at zero and fall off much faster than a Gaussian. As a result, the derivative filter outputs are concentrated at zero and therefore are sparse. Similar histograms are
observed for vertical derivative filters and for the gradient magnitude: ∇I .
(a)
(b)
(c)
(d)
(e)
(a) (b)
(c) (d)
Fig. 3.3. Two natural images and their filter derivative output diagrams. (a), (c) Natural image. (b), (d) Histogram of derivative filter outputs.
Subsequently we convert each of the filter outputs of Fig. 3.3 into a log histogram type and show the results in Fig. 3.5. Observing Fig. 3.5, we can find that the distributions are similar to an exponential density with exponent less than 1. For comparison, we show the log probability for densities of the form which is presented in [33]. We plot the corresponding log probabilities for
( )
x e xαp = −
>1
α , α =1 and α <1, respectively for . The results are shown in Fig. 3.4. Comparing Figs.
3.5(b) and 3.5(d) with Fig. 3.4, it can be found that the natural statistics for derivative filters has the qualitative nature of a distribution with
≥0 x
xα
e− α <1. Similar to derivative filters, the gradient magnitude also has the character. When we define a cost function, we will use this character of derivative filters and gradient magnitude operators. More descriptions about edge detector and gradient magnitude are in Sec.
3.2.1.
Now we consider the other operator, “corner detector.” In this paper we use a Harris-like operator c(x, y) as a corner detector. There are more detailed descriptions about Harris corner detector [34] in Sec. 3.2.1. Here we first treat the qualitative statistic of the operator. The output of the detector at a given location is defined as:
(a)
(b)
(c)
Fig. 3.4. The log probability for densities of the form p
( )
x =e−xα . (a) α >1 (α =2). (b) α =1. (c) α <1 (α =0.25).
(a) (b)
(c) (d)
Fig. 3.5. Two natural images and their filter derivative output diagrams. (a), (c) Natural images. (b), (d) Log histograms of derivative filter outputs.
Similarly, we show the histograms of this corner operator for the two natural images given in Fig. 3.3. The results are shown in Fig. 3.6. They are also sparse.
Again, we convert each of the outputs of Fig. 3.6 into a log histogram type in Fig. 3.7.
Likewise, we compare Fig. 3.7 with Fig. 3.4 and then find the same character as derivative filters and gradient magnitude operators. The corner detector also has the qualitative statistic of a distribution e−xα with α <1.
By applying the gradient magnitude and corner detectors described above on a number of images, it can be found that the histograms shown in Fig. 3.3, Fig. 3.5, Fig. 3.6, and Fig. 3.7 are typical. For both gradients and corner detectors the exponent was less than 1 and the exponent for the corner detector was smaller than that of the gradients. Typical exponents are 0.7 for the derivative filter and 0.25 for the corner detector [33].
The qualitative statistics observed in natural images motivated the type of the cost function. The histograms of these operators can be fit with a generalized Gaussian distribution p x
( )
∝e−xα. By using the negative log probability of these operators on natural images, the cost function for a single layer is defined as:
( ) ( ) (
histograms of the operators (gradient and corner operators) in natural images, and=15
η which is determined by the ratio of the scaling parameters in the corner and gradient distributions [33]. The cost for a two layer decomposition is the sum of the costs for the two decomposed layers:
(
1 2) ( )
1( )
2cost I I, cos= t I +cost I (3.4)
(a) (b)
(c) (d)
Fig. 3.6. Two natural images and their corner detector output diagrams. (a), (c) Natural images. (b), (d) Histograms of corner detector outputs.
(a) (b)
(c) (d)
Fig. 3.7. Two natural images and their corner detector output diagrams. (a), (c) Natural images. (b), (d) Log histograms of corner detector outputs.
Now we give an example to test the accuracy of the cost function. We apply the cost function to the image and some possible decompositions of it shown in Fig. 3.2.
The cost values of them are evaluated and shown in Fig. 3.8. Clearly, we can find that the perceptually correct decomposition in Fig. 3.8(e) indeed has smaller cost than the other decompositions in Fig. 3.8. Of course, these decompositions are just a small amount out of an infinite number of possible decompositions. In Chapter 4, we will show some examples of one dimensional decompositions.
3.2.1 Edge Detector and Corner Detector
For the cost function defined above, the derivative and corner operators are essentials. Image edges have already been defined as local variations of image intensity. Therefore, local image differentiation techniques can produce edge detector operators. The gradient of an image at location (x, y) is defined as
f x y
( )
, f f Tx y
⎡∂ ∂ ⎤
∇ = ⎢⎣∂ ∂ ⎥⎦ (3.5)
where the partial derivatives ∂ ∂f x and ∂ ∂f y can be used to detect the perpendicular and horizontal edges respectively in an image. Its magnitude is given as
( )
, f 2 f 2which can be used as an edge detector. Fig. 3.9 shows two common gradient operator masks, the left masks are ∂ ∂f y, the right ones are ∂ ∂f x. Fig. 3.10(b) shows a
(a) cost= 5693.8
cost( I1 ) = 4207.9 (b) cost( I2 ) = 3204.8
cost( I1 ) = 5693.8 (c) cost( I2 ) = 0
cost( I1 ) = 3059.9 (d) cost( I2 ) = 5786.4
cost( I1 ) = 2380.1 (e) cost( I2 ) = 2762.7
Fig. 3.8. Cost values for an input image and some decompositions. (a) Cost = 5693.8. (b) Cost= 4207.9 + 3204.8 = 7412.7. (c) Cost= 5693.8 + 0 = 5693.8. (d) Cost
= 3059.9 + 5786.4 = 8846.3. (e) Cost= 2380.1 + 2762.7 = 5142.8.
Fig. 3.9. Sobel and Prewitt edge detector masks.
Now we discuss the other operator — corner detector. One of the first approaches to find corners was to segment the image into regions, extracting the boundaries as a chain code, and then identify corners as points where boundary direction changes rapidly. This approach has been largely abandoned as it relies on the previous segmentation step, which is a complex task itself, and is also computationally expensive.
Harris and Stephens [34] presented a corner detector which is widely used today.
The Harris corner detector first computes a corner response function. Corners are detected by finding the local maxima in the corner response function. The program returns the corners in the decreasing order of their corner response function value. A Harris matrix is defined as
2
x x y
I I I
⎡ ⎤
where is the smoothing by a 2D Gaussian function. Harris corner detector
5) The remaining pixels are corner points.
However, a more efficient approach is presented by using a corner responses function which is defined as:
( ) ( )
2det trace(A)
R= A −k (3.8)
where . This function avoids the explicit eigenvalue decomposition of A, and therefore is computationally faster. The output R is a constant for each pixel, and pixels with large R are picked as corners. Fig. 3.10(c) shows the result of an image processed by the corner responses function. The corner points are detected and marked with red “+” symbols.
0.04 k≈
3.2.2 Preprocess and Anisotropic Diffusion
The decomposition of an image is based on the cost value of its two decomposed layers. However, in real images, the textures are more complicated and sometimes there may be noise existing. Consequently, when we detect edges by edge magnitude and corners with a Harris detector, it may give responses in many seemingly flat regions of the image. With these wrong responses, the output of the cost function likely has a deviation, hence the decomposition may fail. To solve this problem, we
make preprocess before apply cost function to calculate the cost value of the two decomposed layers. We first apply a nonlinear smoothing respectively to each layer and then apply Eq. (3.3) to the smoothed layers. The reason for this preprocess is that we want our edge and corner operators to return zero in regions that are nearly uniform, and hence the deviation of cost value will be decreased. Here we use a method named as anisotropic diffusion [35] to do the smoothing.
One of the most prevalent uses of image processing is for smoothing or denoising images. Denoising of images is often done with a low-pass filter, which reduces noise, but also blurs sharp features and details, such as edges. Anisotropic diffusion is a method for smoothing complex, noisy surfaces, while preserving sharp, geometric features. An anisotropic diffusion equation is defined as
( )
(
, ,) (
, ,)
It =div c x y t ∇ =I c x y t ∆ + ∇ ⋅∇ (3.9) I c I
where div represents the divergence operator, ∇ and ∆ respectively represents the gradient and Laplacian operator with respect to the space variables, and
is conduction coefficient at different locations and times. It reduces to the isotropic heat diffusion equation
(
, ,)
c x y t
It = ∆ if c I c x y t
(
, ,)
is a constant. A diffusion in which the conduction coefficient is chosen locally as a function of the magnitude of the gradient of the brightness function, i.e.,(
, ,) ( (
, ,) )
c x y t = g ∇I x y t (3.10)
will not only preserve, but also sharpen the brightness edges if the function g i
( )
is(a)
(b) (c)
Fig. 3.10. Image processed by edge detector and corner detector. (a) Input image. (b) Edge detected result. (c) Harris corner detected result.
For implementation, Eq. (3.9) can be discretized on a square lattice, with brightness values associated to the vertices, and conduction coefficients to the arcs such as shown in Fig. 3.11. A 4-nearest-neighbors discretization of the Laplacian operator can be used: subscripts respectively for North, South, East, and West. The superscript and subscripts on the square bracket are applied to all the terms it encloses, and the symbol ∇ here indicates nearest-neighbor differences defined as:
, 1,
The conduction coefficients are updated at every iteration as a function of the brightness gradient (3.10). The value of the gradient can be computed on different neighborhood structures achieving different compromises between accuracy and locality. The simplest choice is approximating the norm of the gradient at each arc location with the absolute value of its projection along the direction of the arc:
( )
chosen by the users. Nevertheless, different functions which are chosen properly sometimes give perceptually similar results. Here we use the function below:( )
( I K 2g ∇I = e− ∇ ) (3.14)
uniform regions into uniform ones and reduces noise while preserving edges, it greatly reduces the number of spurious “edges” and “corners” found by the gradient and Harris operators.
Fig. 3.11. The structure of the discrete computational scheme for simulating the diffusion equation. The brightness values I are associated with the nodes of a i j, lattice, the conduction coefficients c to the arcs.
(a) (b) (c)
Fig. 3.12. Comparison between linear smoothing and anisotropic diffusion.
(a) Original image. (b) Linear smoothing result. (c) Anisotropic diffusion result.
3.3 Discretization Using A Natural Images Database
Since there will be huge number of decompositions for an image, we have to find the minimum cost among all the possible decompositions. This problem is extremely difficult because of the huge space for the decompositions. We first consider a one dimensional subspace, a one dimensional family of decompositions for an image and their cost values will be calculated. The results are presented in Chapter 4.
Subsequently, we attempt to find out more decompositions than one dimension. We use an approach whereby the problem is first discretized using a database of natural images. Instead of optimizing over the infinite space of possible decompositions, we discretize the problem by dividing the image into small 7×7 patches, and then search the suitable decomposition for each patch. The notion of using a patch representation was motivated by the success of this approach in a number of recent vision applications [36], [37] and some feature matching methods.
For each patch, its decomposition is obtained by searching a database of natural images for pairs of patches that approximately sum to the input patch, as some examples in Fig. 3.13. The database of patches are simply all patches contained in some images chosen for decomposing an image. The more features similar to the input image the database has, the more accurate the decomposition will be. For a patch p, we need to find a pair of patches
(
p p1, 2)
such that p≈ p1+p2. For a database built with a image, a naive way of performing this search for each patch is to search all about possible pairs of patches, but this will be incredibly slow. To speed up the search, we make use of a filter bank which is a collection of directional filters at different locations, orientations and phases [38]. The640 480×
10 105× 5
derivative and its Hilbert transform. Let f x y1
( )
, =Gσ"1( )
y Gσ2( )
x and f x y2(
,)
is constant. Why is this useful from a computational point of view? The vector of filter outputs I f x y∗ i(
0, 0)
characterizes the image patch centered at(
x y0, 0)
by a set of values at a point. This is similar to characterizing an analytic function by its derivatives at a point. The collection of response images I f∗ is referred as the i hypercolumn transform of the image. The hypercolumn transform provides a convenient method for contour and texture analysis and a good local descriptor of image patches.To find patches p≈ p1+p2, we first search the database for the patches p 1 which minimize
∑
i fi∗ − ∗p fi p1 where f is a collection of 12 directional filters i which consist of two phases (even and odd), one scales, and six orientations in the upper left of the filter bank depicted in Fig. 3.14. After p is found for a patch, 1 p 2 could subsequently be chosen by performing a second search for p p− . The search 1 now requires only O(
2 10× 5)
operations rather than O(
10 105× 5)
and hence saves a lot of time.(a) (b) (c)
Fig. 3.13. Some examples for local patches decomposition. (a) Input patches. (b), (c) A possible pair of patches of decomposition for (a).
Fig. 3.14. A filter bank consisting of two phases (even and odd), three scales, and six orientations (equally spaced from 0 to π).
Chapter 4
Simulation and Results
We first show the results of face detection, MSRCR image enhancement technique, and eye detection algorithm. First, we capture frontal face images of people by a CCD camera, and then we will apply these algorithms to these images. In Sec. 4.1, the algorithm will be tested of people wearing sunglasses. We will give the results step-by-step of these algorithms. In Sec. 4.2, we will first show some results of one dimensional decompositions to verify the accuracy of the cost function for separation. Subsequently, results of decomposing some images with reflection into two layers by discretization, which makes use of patches and a collection of directional filters, will be shown.
4.1 Experiment Results of Eye Detection with Sunglasses
We take face images of two students in our laboratory to test the face detection and eye extraction with sunglasses images. The size of input images is 640×480 and the simulation is implemented on a Pentium IV 3.0GHz personal computer. These two examples are shown in Figs. 4.1–4.2. In each example, (a) is the original face image,
We take face images of two students in our laboratory to test the face detection and eye extraction with sunglasses images. The size of input images is 640×480 and the simulation is implemented on a Pentium IV 3.0GHz personal computer. These two examples are shown in Figs. 4.1–4.2. In each example, (a) is the original face image,