Chapter 1 Introduction
1.4 Face Direction Detection
As described above, we introduce Fourier descriptors for human direction detection. In order to enhance the accuracy of human direction detection, it may be helpful to detect from one’s face direction through locating one’s pupils and their geometric relationship in the face to be utilized in estimating the face direction.
Combining face direction estimation with human direction estimation escalates the accuracy greatly.
When we take an image from the front of a subject, facial image is available. If we search for the eye from the whole face image, the background would deteriorate the result frequently. In order to reduce the region for eye search and find the eye position more accurately, we carry out the face segmentation first. There are many methods proposed for face detection in recent years. According to the survey of to Hjelmas and Low [18], the major approaches are listed in Table I. In this thesis, we adopted the approach proposed by Garcia and Tziritas [19], in which color feature is used to identify a human face in an image. This is feasible because human faces have a special color distribution that differs significantly (although not entirely) from those of the background objects. Hence, this approach requires a color map that models the skin-color distribution characteristics.
After we separate the skin color region from a mug image, it is still difficult to search for the iris from their large skin color region. Because the search region is still too large. Obviously, before any of the components of the eye can be extracted and fitted, the eyes have first to be located in the face. Donato et al. [20] compared several techniques for recognizing upper face images and lower face images. These techniques include optical flow, principal component analysis, independent
best performance was achieved by Principle Component Analysis (PCA). We therefore start with the PCA technique to estimate eye locations so that we can set up efficient search for the iris.
TABLE I
MAJOR FACE DETECTION APPROACHES
Authors Year Approach Feature Used Head Pose Test Databases Féraud et al. [21] 2001 Neural Networks Motion; Color;
Texture
Frontal and profile
Sussex; CMU;
Web images
Maio et al. [22] 2000 Facial Templates; Hough Transform
Texture;
Directional images
Frontal to near frontal
Static images
Garcia et al. [19] 1999 Statistical wavelet analysis Color; wavelet coefficients
Frontal to profile
MPEG videos
Wu et al. [23] 1999 Fuzzy color models;
Template matching
Color Frontal Still color images
Sung et al. [24] 1998 Learning Texture Frontal Mug shots; CCD pictures;
Newspaper scans
Yang et al. [25] 1998 Multiscale segmentation;
color model
Skin Color;
intensity
Frontal Color pictures
Yow et al. 1997 Feature; Belief networks Geometrical facial feature
Frontal to profile
CMU
1.5 Combining Subject Direction with Face Direction
In this thesis, we propose a novel method to estimate the direction of subject using subject’s silhouette and face region. When we get an image including whole human body, the face region is too small to recognize its direction. For this reason, after subject direction detection by silhouette, we automate Pan-Tilt-Zoom (PTZ) camera to track and then zoom the face in to a level at least 80% vertically, which is ready for face direction detection.
When the subject direction angle is almost frontal to the camera, the face is more accurately than that from subject’s silhouette. It is wise to fuse these two estimation angles for better direction detection.
1.6 Thesis Outlines
The contents of this thesis are organized as follows. In Chapter 2, the detection algorithm of subject direction will firstly be described. In this chapter, we discuss the process to estimate the direction of a subject. In Chapter 3, we propose a method to estimate the face direction to enhance the direction estimation accuracy of subject interested. Then we do some simulations and show our experimental results in Chapter 4. At last, we give some conclusions and discuss future work to be investigated further in Chapter 5.
Chapter 2 Subject Direction Detection
In this chapter, we propose an approach which combines Fourier Descriptor (FD) based on Fourier series analysis, with Linear Discriminate Analysis (LDA). This method can be used to extract subject features and to optimize the class separability of different subjects by their contours. Firstly, we utilize Fourier descriptors to discriminate different direction of target subject. Secondly, we use LDA to maximize between-class and in the same time minimize within-class variations to improve the classification performance. At last, we calculate the Euclidean distance between the subject of input image and out model which built from Fourier descriptors to estimate the direction angle of the subject.
2.1 Pre-processing for Subject Extraction
2.1.1 Background Modeling
We assume the image captured by a camera can be described as
(
x, y)
S(
x, y) (
r x, y)
,Ii = i i (2.1)
where Ii
is the intensity of the scene, S
i is the spatial distribution of source illumination, ri is the distribution of scene reflectance,(
x, y)
is the location of a pixel in the image and i is the image sequence index. If the camera is fixed stationary and moving subjects are not permitted to show up in the scene, the reflectance of the background may remain the same at any time. That is(
x, y)
r(
x, y)
,ri = (2.2)
Although the reflectance is not changed, the effect of illumination is still going on.
The frame ratio between two consecutive frames can respectively be written as
( )
where I is the intensity of captured images, S is the spatial distribution of source illumination.
We propose to utilize the frame ratio to build the background model. Each pixel of background scene is characterized by three statistics: minimum intensity value
( )
x yn , , maximum intensity value m
( )
x, y and maximum inter-frame ratio d( )
x, yof a background video. Because these three values are statistical, we need a background video, without any moving objects, for background model training. Let I be an image frame sequence and contains N consecutive images. Ii
(
x, y)
be the2.1.2 Foreground Subject Extraction
Foreground subjects can be segmented from every frame of the video stream.
Each pixel of the video frame is classified to either a background or a foreground pixel by the difference between the background model and a captured image frame.
We utilize the maximum intensity m
( )
x, y , minimum intensity n( )
x, y andmaximum inter-frame ratio d
( )
x, y of the training background model to segment a foreground by gray level of a pixel in a binary image and k is a threshold. Threshold k is determined by experiments according to difference environments. The value of k affects the mount of information retained in binary image B.According to binary image B, we extract the region of foreground subject to minimize the image size. Foreground region extraction can be accomplished by simply introducing a threshold on the histograms in X and Y direction. Fig. 2.1 shows an example of foreground region extraction. We utilize the binary image and project it to X and Y directions. The interested section has higher counts in the histogram. We obtain the boundary coordinates x1, x2 of X axis and y1, y2 of Y axis from the projection histogram. We can use these boundary coordinates as the corner of a rectangle to extract foreground region. Fig. 2.2 is the extracted foreground region.
otherwise
Fig. 2.1 Histogram of binary image projection in X and Y direction.
Fig. 2.2 The binary image of extracted foreground region.
2.2 Model Establish Using Fourier Descriptor
2.2.1 Fourier Descriptor Review
Fourier Descriptors is a useful implement for describe closed curve shape which obtained from subject contour. We can obtain a periodic function along the closed curve and this function can represent by a Fourier series. Fourier descriptors was first suggested by Cosgriff [16] who represented an image using Fourier series in 1960.
The follow-up development work is established the foundation by much scholars.
Zahn and Roskies [13] introduced Fourier descriptors using normalized arc-length (assuming boundary is traced counter clock-wise). They expressed the close curve into the function of arc-length. With the accumulation change in this curve each point direction, we can obtain the Fourier series.
2.2.2 Pre-processing for Subject Contour Extraction
There are many operations based on morphology, such as dilation operator, erosion operator, opening, closing, segmentation and watersheds. We apply the morphological filtering by using the cascaded opening and closing operations, which consist of dilation and erosion operators defined in the following:
{
(x, y) (u, v) | (x, y) A,and (u,v) B}
, BX ⊕ = + ∈ ∈ (2.6)
{
w | B A}
, BX Ο = w ⊆ (2.7)
where and are erosion and dilation respectively. To combine and , we have the opening and closing which are defined as:
, )
(
X A B
B
X
= Ο ⊕ (2.8) ,)
(
X A B
B
X
• = ⊕ Ο (2.9)The examples of opening operation and closing operation are illustrated in Fig. 2.3.
(a)
(b)
Fig. 2.3 Morphology operation. (a) opening operation. (b) closing operation.
Opening operation removes small noise. Closing operation repairs the hole inside the foreground. To repair the hole with a larger region, morphology closing must use a larger mask. But a larger mask will also dilate the boundary of human foreground.
Therefore, we use region filling method to fill the larger region where inside the foreground and unable to dilate the boundary.
Assume A denote a set containing a subset whose elements are 8-connected boundary points of a region. Beginning with a point p inside the boundary, the
nonboundary (background) points are labeled 0, then we assign a value of 1 to p to begin. The following procedure then fills the region with 1’s:
{ }
Fig. 2.4 (a) Original image A. (b) Result of filling all region. (c) Symmetric structuring element B.
where X0=p, and B is the symmetric structuring element. The algorithm terminates at iteration step k if Xk=Xk−1. The set union of
X and A contains the filled set
k and its boundary. The dilation process would fill the entire area. Therefore, we can extract the boundary of the original image.Then we have to go clockwise around the contour and recode the image boundary coordinate for Fourier descriptors analysis. Chain codes are used to represent a boundary by a connected sequence of straight-line segments of specified length and direction. Typically, this representation is based on 4- or 8-connectiviry of the segments (we use 8-connectivity in our research). The direction of each segment is coded by using a numbering scheme such as the ones shown in Fig. 2.5.
(a) (b)
Fig. 2.5 (a) The direction of each segment. (b) 8-directional chain code.
2.2.3 Zahn and Roskies’ Cumulative-Angle Approach
In this section, we introduce Zahn and Roskies’ Cumulative-Angle Approach to represent Fourier descriptors. We assume
γ
is a clockwise-oriented simple closed curve with parametric representation (x
(l
),y
(l
)). Let (x
(0),y
(0)) be the starting point and we denoteZ
(l
)=x
(l
)+jy
(l
), where l is the arc length of the starting point toZ
(l) and 0≤l
≤L
. Denote the angular direction ofγ
at point l by the function)
θ
(l and letδ
0=θ
(0) be the absolute angular direction at the starting pointZ
(0). We now define the cumulative angular functionφ
(l) as the net amount of angular bend between starting point and point l (see Fig. 2.6).Fig. 2.6 Parameters of a curve representation.
θ
(l) is angular direction of point.Z(l),φ
(l) is the cumulative angular between starting point andWith this definition
φ
(0)=0 and φ(l)+δ0, is identical toφ
(l) except for a possible multiple of 2π . Besides, It is not hard to see thatφ
(L
)=−2π
because all smooth simple closed curves with clockwise orientation have a net angular bend ofπ
−2 . As a result,
φ
(L) does not convey any shape information. The domain of definition [0, L] ofθ
(l) simply contains absolute size information and we would like to normalize to the interval [0, 2π ] which is standard for periodic functions.Hence we define a normalized variant
φ
*(t
) whose domain is [0, 2π ] and such thatand
φ
*(t
) is a periodic function which is invariant under translations, rotations and changes of perimeter L.We now expand
φ
* as a Fourier seriesIn polar form, the expansion is
( )
cos( )
, the Fourier descriptors for curveγ
and are known respectively as k-th harmonic amplitude and phase angle.According to Euler formula, we can use the polar coordinates
x
=r
cosθ
andθ
sin
r
y
= to rewrite the complex parameterz
=x
+jy
as(
cosθ j
sinθ ) re
jθ,r
z
= + = (2,14)Then we introduce Zahn and Roskies’ derive formulas for the Fourier coefficients
{
a ,k bk}
and μ when 0γ
is a polygonal curve. We assume the curve definitions as shown in Fig. 2.7, it is not hard to verify that,
Fig. 2.7 Simple representation of a closed polygonal curve; Δ is edge length and
l
iφi
Δ is the angle between two edges.
Expanding
φ
* we get( ) ∑
∞( )
Since )φ
(l is a step function, after some manipulations we can obtain1 ,
The final forms of the expressions for a ,n bn are especially appealing because of their similarity and also because Δφk, represents the angular change (bend) in the curve’s direction at the k-th polygonal vertex, and
l is the arc length from the
kstarting vertex to the k-th vertex.
It is clear from these expressions alone that the Fourier coefficients (an,bn) contain no information relating to absolute position or rotational orientation of the curve. In the amplitude/phase angle form of the Fourier series
( )
cos( )
,1
* 0
∑
∞=
− +
=
n
n
n
nt
A
t μ α
φ
(2.26)Coefficient pair (An,α are the polar coordinates for the point n) (an, bn). Coefficient
A is called the n-th harmonic amplitude and
nα
n is the nth harmonic phase angle. Of course when An=0 the n-th term vanishes andα
n is undefined.2.2.4 Properties of Fourier Descriptor
For a specific curve, the main advantage of Fourier descriptors is the invariance to translation, rotation, scaling of the observed object and starting point. Thus will shape description become independent of the relative position and size of the object in the input image. To be more specific, distance between object and camera and placement of the object relative to the optical axis of image acquisition system will not affect values of the Fourier descriptors. In fact, Fourier descriptors are not immediate sensitivity to this change. But this change are related to simple operations of the boundary’s Fourier descriptors, as summarized in Table. 3.1.
● Translation
If we translate the object, we are really just adding some constant to all of the values of
x
(l) andy
(l). Hence, we only change the zero-frequency component.Mean position only, nothing about the shape. So, except for the zero-frequency component, Fourier descriptors are translation invariant.
● Rotation
In complex analysis that rotation in the complex plane by angle
φ is
multiplication by ejφ. Thus, rotation about the origin of the coordinate system only multiplies the Fourier descriptors by ejφ.● Scaling
Suppose that we resize the object. That’s equivalent to simply multiplying
x
(l) and )y
(l by some constant. Hence, that is just multiplication of the Fourier descriptors by the same constant.● Start Point
According to our discussion of the Fourier transform that translation in the spatial domain is a phase-shift in the transform. Thus, the harmonic amplitude is invariant to the start point, and the phase part shifts accordingly.
TABLE II
PROPERTIES OF FOURIER DESCRIPTORS WHEN BOUNDARY
u
(t) CHANGED FROMTRANSLATION,ROTATION,SCALING, AND STARTING POINT
Transformation Boundary Fourier Descriptor
Identity
u
(t)a
nTranslation
u
~(t
) =u
(t
) +Z
n n
n
a Z
a
= +δ
~
Rotation
u
~(t
) =u
(t
)e
jφa
~n =a
ne
jφ Scaling or Zoomingu
~(t
) =R u
(t
) a~n = RanStarting point ~( ) ( )
t0
t u t
u = − ~ jnt0
n
n
a e
a
=2.3 Linear Discriminant Analysis
In this section, we introduce LDA to optimize the class separability of different objects by their contours which has been described in Section 2.2. Linear Discriminant Analysis (LDA) is a classic method of classification which seeks to find a linear transformation by maximizing the between-class variance and minimizing the within-class variance, has proved to be a suitable technique for discriminating different pattern classes. The purpose of LDA is to find a vector W. We can map the input data through W to a new coordinate z that has a larger separability.
Assume that there are c training classes to be learnt. Each class represents a direction of subject that obtain form Fourier descriptor analyze.
y is the j-th vector
i,jin class i, and Ni
is the number of vectors in i-th class. The total number of training
++ +
=
,
The mean vector of the entire set is given by
i
and the mean vector of the i-th class is represented by
1 ,
Let Sw
denote within-class matrix and S
bdenote between-class matrix, then
,
where Sw represents the mean of within-class vectors distance and Sb
represents the
mean of between-class vectors distance. The objective is to minimize Swand
maximize Sbsimultaneously, that is to minimize the criterion function known as the
generalized Fisher linear discriminant function and given by,
Because we want to get a maximal J(W), it is fine that the denominator is the smaller and the numerator is the bigger. It can utilize the Lagrange multiplier to solve W for
maximized J(W). We restrict the length of W to let denominator is 1, and to maximized the numerator.
,
This is a generalized eigenvalue problem. When Sw have inverse, we can translate Eq.
(2.33) as
b ,
1
wS W W
S- = λ (2.34)
Then we can solve the transformation matrix W
b,
1 wS S
W = - (2.35)
By using this basis, each point in FD can be further projected to another point in this new space by matrix. Following this analysis, different classes will be greatly separated which means that linear discriminant analysis is useful to separate different classes of samples.
2.4 Estimation of Subject Direction by Interpolation
Using Fourier descriptors and linear discriminant analysis, we can estimate the direction of interested subjects. After the FD and LDA processes on the input image.
We can calculate the Euclidean distance between the subject of interests and built model of the subject direction. For example, we build seven directions of a subject including 0°,30°,60°,90°,120°,150°,and180°, as shown in Fig. 2.8. Then we calculate the Euclidean distances between the subject of interests and the seven direction models. The angels of the least two Euclidean distance models could represent the possible direction range of the subject. The direction of minimum distance model is more probable than the direction of the second minimum model, in which the leverage is depending on the ratio of these two minima. The subject’s direction could be estimated by the linear interpolation on these two direction angles.
Fig. 2.8 The representation of a 3D subject by 2D views.
Chapter 3 Face Direction Detection
In this chapter, we propose a method to estimate the face direction of subject interests to enhance the accuracy of direction detection of target subject interests.
Using image processing, the face angle will be estimated from the geometric relationship of pupils in the face. Firstly, we utilize skin detection formula in the
r bC
YC color space and edge detection to find head region. Secondly, we utilize PCA to find eyes position and then calculate the pupil centers for estimate the face direction. Consequently, we can combine object direction from FD and face direction to better detect the direction of subject/object of interests.
3.1 Head Region Extraction
3.1.1 YC
bC
rColor Space
Our purpose is to segment the head region for estimating face direction, the first stage in the head detection algorithm is using skin segmentation to reject as much
“non-face” of the image as possible. There are two color models of segmenting the image based on skin color have been evaluated and used. The YCbCr model is naturally related to MPEG and JPEG coding. The HSV (Hue, Saturation, Value) model is used mainly in computer graphics. In [19], YCbCr and HSV color space for skin color segmentation have detailed description. The skin colors distribution in
r bC
YC and HSV color space is shown in Fig. 3.1.
From Fig. 3.1, we can conclude the skin color distribution in YCbCr color
image to the YCbCr color space is that the effect of luminosity can be decoupled with coloring components during our image processing. For this reason, we utilize
r bC
YC color space for skin color region detection.
(a) (b)
Fig. 3.1 (a) Sample skin colors distribution in HSV color space and (b)YCbCr color space.
In the RGB domain, each component, red, green and blue, of the picture element has a different strength value. However, in the YCbCr domain, the pixel’s is given by the Y-component, the Cb (blue) and Cr (red) components. The color space of
r bC
YC , which revises the color space of YUV, can divided luminance component (Y), and two chromatic blueness component (Cb), redness component (Cr). The separation of the luminance is high. The following conversion matrix is used to convert the RGB image into Y, Cb and Cr components
⎪⎩
3.1.2 Combine the Region of Skin Color Detection and Edge Detection
The first stage of the algorithm is to classify the pixels of the input image into skin region and non-skin region. To do this, we obtain a skin-color reference map in
r bC
YC color space. It has been proved that a skin-color region can be identified by the presence of a certain set of chrominance values (i.e., Cb and Cr) narrowly and consistently distributed in the YCbCr color space. In Fig. 3.1(b), the intersections of the adjusted bounding planes with the CbCr plane for Y =160 are displayed. We report the respective ranges of Cb and Cr values that correspond to skin color defining the bounding planes, which subsequently define our skin-color reference map. The ranges to be most suitable for all the input images are
⎪⎪
As one can notice, there are two sets of eight equations since Y =128 depending )
approximate the distribution borders in the light and dark extreme cases. An example
approximate the distribution borders in the light and dark extreme cases. An example