Identity verification by relative 3-D structure using
multiple facial images
q
Jau Hong Kao
*, Yen Heng Chen, Jen Hui Chuang
Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan, Province of China Received 3 December 2003; received in revised form 27 October 2004
Available online 15 December 2004
Abstract
Identity verification is one of the critical issues in the sector of security and has been emerging as an active research area. In recent years, technologies using biological features to address problems of identity verification have attracted numerous research interests. For examples, fingerprint recognition, voice recognition and pattern of blood vessels in the retina have spanned many commercial applications. However, special and expensive equipments such as fingerprint readers and iris scanners are often required and people have to be in unpleasant poses occasionally. This paper presents a study on computer vision technique and its application in face recognition to achieve identity verification. With multi-ple facial images taken from different view angles, relative affine structures are computed and are used as measurements. To that end, the explicit relationship between relative affine structure and the cross ratio which is a view-invariant under perspective projection is also addressed. The proposed method neither requires camera calibration nor reconstructs 3D models. According to simulation results, the developed approach can achieve satisfactory results given the feature points of facial images.
2004 Elsevier B.V. All rights reserved.
Keywords: Identity verification; Relative affine structure; Cross ratio; Perspective projection
1. Introduction
Machine recognition of faces has been a very active research topic in recent years (Belhhumeur et al., 1997; Chellappa et al., 1995; Samal and Iyengar, 1992; Zhang et al., 1997). Face recogni-tion technology for still and video images has potentially numerous commercial and law enforce-ment applications. These applications range from
0167-8655/$ - see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2004.11.008
q
This work is partly supported by National Science Council of Taiwan, Republic of China, under grant no. NSC92-2213E009006 and by Ministry of Economic Affairs, Taiwan, Republic of China, under grant no. g3-EC17A02S1032.
*
Corresponding author.
E-mail addresses: gis88804@cis.nctu.edu.tw (J.H. Kao),
jchuang@cis.nctu.edu.tw(J.H. Chuang).
static matching of well-formatted photographs such as passports, credit cards, driverÕs licenses, and mug shots, to real-time matching of surveil-lance video images presenting different constraints in terms of various processing requirements. Although humans seem to recognize faces in clut-tered scene with relative ease, machine recognition which often spans several disciplines such as image processing, pattern recognition, computer vision, and neural networks is a much more daunting task. In particular, the problem can be formulated as follows: Given still or video images of a scene, identify one or more persons in the scene using a stored database of faces. A complete face recogni-tion system generally includes two main stages. The first stage is the face detection stage that deter-mines the existence of one or more faces in an image. Techniques used in this stage involve seg-mentation of faces from cluttered scenes and extraction of features from the face region. The challenges are mainly due to the fact that the posi-tion, orientation and size of face regions in an arbi-trary image are usually unknown (Rowley et al., 1998; Yang and Huang, 1994; Jeng et al., 1998). A survey of face detection techniques can be found in (Kriegman et al., 2002). The second stage is the recognition stage which deals with the identifica-tion and matching problems. The goal is to deter-mine the identities of the target faces obtained in the first stage. Considering important works devel-oped so far in the recognition stage in the engineer-ing literature, a brief survey on the face recognition researches in recent years is provided in what follows.
Most of existing face recognition algorithms are 2D-based. In terms of the nature of the facial fea-tures utilized, these 2D algorithms can generally be divided into two major categories: structure-based approaches and statistics-based approaches. The class of structure-based ones uses structural facial features, which are mostly local structures, e.g., the shapes of mouth, nose, and eyes (Mirhosseini and Yan, 1998; Lades et al., 1994; Kanade, 1974; Phil-lips, 1998). In (Kanade, 1974), an automated rec-ognition system that uses a top-down control strategy directed by a generic model of expected feature characteristics is developed. They proposed an elastic graph matching model which extracts
the feature vectors from image lattices based on a set of 2D Gabor filters. The main advantage of a structure-based face recognition method is the low sensibility to irrelevant data, e.g., moving hair or background, since it only handles data of inter-est instead of using all image data indiscriminately. The main disadvantage of such approaches is the high complexity in feature extraction.
The statistics-based approaches basically use the whole 2D image as facial features (Belhhumeur et al., 1997; Bichsel and Pentland, 1994; Lin et al., 1997; Liao et al., 1988). In this category of ap-proaches, the principal component analysis (PCA) exhibits particular importance (Hotta, 2003). The principal components, e.g., Eigenface (Turk and Pentland, 1991; Pentland and Turk, 1991), of training face images are calculated and then used as a set of orthonormal basis. The com-plete space can be represented effectively by a sig-nificant small subset of these orthonormal facial images and the dimension of the feature space of facial images is thus reduced. Moreover, theoreti-cal neuroscience has contributed to account for the view-invariance perception, which is also the underlying idea of our work for identify verifica-tion, of universals such as the explicit perception of featural parts and wholes in visual scenes. A survey of recent developments in theoretical neuroscience for machine vision can be found in (Colombe, 2003). These unsupervised learning methods are used to make predictive perceptual models of the spatial and temporal statistical structure in natural visual scenes. In particular, given the spatio-temporal continuity of the statis-tics of sensory input, invariant object recognition might be implemented using a learning rule that uses a trace of previous neural activity capturing the same object under different transforms in the short time scale. By first relating a modified Heb-bian rule to error correction rules and exploring a number of error correction rules that can be ap-plied to invariant pattern recognition, Rolls and Stringer (2001) developed learning rules related to temporal difference learning. The analysis of temporal difference learning provides a theoretical framework for better understanding the operation and convergence properties of rules useful for learning invariant representations. In contrast to
structure-based approaches, statistics-based ones are more straightforward and simple. However, it happens that important local features are used with small factor of importance. As for theoretical neuroscience, it is not yet obvious whether the full power of learning rules is expressed in the brain, and the practical applications in face recognition are needed for the understanding of the perfor-mance. The work in (Rolls and Stringer, 2001) provides suggestions about how they might be implemented. Although the above 2D-based face recognition approaches produce satisfactory re-sults under normal conditions, their performance can deteriorate quickly by varying lighting condi-tion or large change of the viewing geometry.
As the face recognition technology is an essen-tial tool for law enforcement agenciesÕ efforts to combat crime, fake or duplicated facial images which can easily cheat the 2D-based facial recogni-tion systems raise problems of interest (Chellappa et al., 1995). To avoid such problems, a few 3D model-based face recognition are proposed wherein 3D feature points are reconstructed which provide important information for facial recogni-tion. In (Atick et al., 1995) a method based on Karhonen-Loeve expansion is developed to recon-struct 3D face features. The method is claimed to be independent on lighting conditions. In (Ya and Zhang, 1998), the reconstruction of face surface is made rotation-invariant. A similar ap-proach based on a depth map obtained from stereo images to perform face segmentation and recogni-tion can be found in (Lengagne et al., 1996). In (Eriksson and Weber, 1999), a model-matching approach is provided to reduce the computational cost of 3D-based facial recognition algorithms.
In this paper, we propose a novel approach to identify a person with facial images using 3D information of facial feature points. Three refer-ence points are first extracted to construct a reference plane in every image. By calculating a view-invariant relative depth, i.e., relative affine structure with respect to the obtained reference plane introduced in (Shashua and Navab, 1996), for each relevant feature point, an efficient face recognition algorithm is developed using the robust measurement. Compared with other 3D ap-proaches that require specific structures in
Euclid-ean space (Atick et al., 1995; Ya and Zhang, 1998), the proposed method uses only a few facial feature points and requires no camera calibration. In addi-tion, iterative training is not required which leads to the issue of convergence in the neural network approaches. Experimental results show that the developed approach performs satisfactorily with an experimental facial image database.
In the following sections, we first introduce re-lated projection geometry for one and two cam-eras. The geometrical relationships between two cameras such as parallax and relative affine struc-ture are discussed in Section 3, together with the geometrical meaning of such a structure which is expressed in terms of the invariant under perspec-tive projection, i.e., cross ratio. Algorithms for face recognition using relative affine structure are presented in Section 4. Simulation results for an experimental facial image database are given in Section 5. Finally, conclusion is given in Section 6.
2. Projective geometry for one and two cameras The basic procedure of projecting 3D points onto an image by a perspective camera can be de-scribed as
m/ PM; ð1Þ
where/ denotes the equality up to a scaling fac-tor, P is the 3· 4 projection matrix, M ¼ ½X Y Z 1T and m¼ ½x y 1Trepresent the homo-geneous coordinates of a 3D world point and the corresponding image point, respectively. In gen-eral, the image coordinate system is defined in terms of image pixels. The general form of the pro-jection matrix can be represented as
Peuc/ KP0T ¼ fx s px 0 fy py 0 0 1 2 6 4 3 7 5½Ij0 0RT t 3 1 : ð2Þ
In(2), K gives the intrinsic parameters of the cam-era, the imaging system. As for T, it describes the location and orientation of the camera with re-spect to the world coordinate system. It is a 4· 4 matrix describing the pose of the camera in terms of a rotation R and a translation t, which give
the extrinsic parameters. For an ideal camera model, both K and T are identity matrices and
(2)becomes
m , P0M : ð3Þ
Consider two cameras taking pictures of an object, as illustrated in Fig. 1, wherein C and C0 are the
two optical centers of the two cameras and v and v0 are their associated image planes, respectively.
The projection of C0on v, e = PC0, observed from
C and the projection of C on v0observed from C0,
e0= P0C, are defined as the epipoles of the two
cameras, respectively. Without loss of generality, we assume that the world coordinate system is aligned with the image coordinate system of cam-era C, thus the projection matrices for C and C0
become
P¼ K33½I33j0 ¼ ½Kj0; ð4Þ
P0¼ K0
33½R33jt31 ¼ ½K0RjK0t: ð5Þ
In addition, we have, by definition, PC¼ K33½I33j031C41¼ 0
or
C/ 0½ 0 0 1T:
Since e0is the projection of C on v0
e0¼ P0C¼ K0t: ð6Þ Consider a 3D point M whose depth is z with re-spect to the camera coordinate system of camera
C. Its projection on the image plane v, from (3), is equal to m/ PM ¼ K ~M with M ¼ Me 1 " # ¼ zK 1m 1 " #
if m is normalized as (x, y, 1)T. The projection on image plane v0 is then
m0/ P0M / K0RK1mþ1 ZK
0t: ð7Þ
With the above geometrical relationships and coordinate transformations between two cameras,
Shashua and Navab (1996) derived the view invariant relative affine structure. The following section provides a brief review, together with its explicit geometric meaning.
3. Relative affine structure and its geometric meaning
In (Shashua and Navab, 1996), an affine frame-work for perspective views is proposed which is captured by a simple equation based on an invari-ant called relative affine structure. It is shown in (Shashua and Navab, 1996) that the framework unifies projection tasks including Euclidean, pro-jective and affine in a natural and simple way. While the algebraic form of the relative affine structure is given clearly in (Shashua and Navab, 1996), as reviewed next, the direct relationship between the relative affine structure and a view-invariant cross ratio under perspective projection, is derived at the end of this section.
Given a reference plane p where the image points m and m0 are projections of a 3D point
Mp2 p on image planes v and v0, respectively.
The homography induced by p can be obtained by Mp= H1m and Mp= H2m0as follows:
m0¼ H12 Mp¼ H12 H1m¼ Hpm: ð8Þ
Since Hphas eight entries (nine minus a scale
fac-tor), Hp can be determined uniquely by solving a
system of linear equations obtained from three
point correspondences in general positions on p and the relationship e0= H
pe. Moreover, once Hp
is computed we can use it to determine positions of points on p from a singe image.
The homogeneous coordinates of p can be writ-ten as
p¼ n31 dp
; ð9Þ
where n and dpdescribe the normal vector and the
depth of p, respectively. For the projection m of Mpon the image plane v, we have
m¼ PMp ¼ ½Kj0Mp:
Since the depth of Mpis unknown, we can assume
that Mp¼ ðK1mÞ 31 q " # : ð10Þ
On the other hand, since Mpis on p, we have
q¼1 dp
nTK1m: ð11Þ
Now, by projecting Mpon v0, we have
m0¼ Hpm/ P0Mp¼ K0 R
tnT
dp
K1m: ð12Þ For more general scenes wherein not all of the 3D points are co-planar, parallax will be produced. For instance, M is a 3D point which is not on the plane p in Fig. 2. m00 and H
pm are projections
of M and Mpon v0, respectively. From(7), (8), (12)
and e0= K0t, we have m00/ K0RK1mþ1 zK 0t ¼ Hpmþ znTK1mþ d p dpz e0: ð13Þ For a point M ¼ ½zK1m 1T
which is not on the reference plane p, the distance from M to p is equal to
d ¼ pTM ¼ znTK1mþ d
p: ð14Þ
Substituting(14)into(13), we have m00/ Hpmþ
d dpz
e0¼ Hpmþ be0: ð15Þ
Since the value of the parallax term b in (15) is normalized, dp can be dropped out, as stated in
(Shashua and Navab, 1996). If we let b0= 1 for a
reference point M0 which is not on the reference
plane (seeFig. 3), we are left with dp¼
d0
z0
and(15)can be rewritten as m00ffi Hpmþ z0 z d d0 e0¼ Hpmþ ke0 ð16Þ
with k being the relative affine structure. In the fol-lowing paragraph, we will investigate a different of
Fig. 2. An example of parallax. M is a point which is not on the reference plane p.
Fig. 3. The geometry of the relative affine structure. z and z0
relative affine structure and its relationship with cross ratio.
In (16), it is not difficult to see that k is an invariant quantity since the variables z0, z, d0, d
are governed by camera C only. Consider Fig. 3, by extending MM0, we can obtain two intersection
points m and Mp, which are on v and the reference
plane p, respectively. By triangular similarity, we have k¼z0 z d d0 ¼mM0 mM MMp M0Mp ¼ CRðm; M; M0; MpÞ: ð17Þ
This leads to a conclusion that relative affine struc-ture is in fact a measure of cross ratio.
Algorithm 1. Computation of relative affine struc-ture for n pairs of image points
(1) Calculate the fundamental matrix F with 8 pairs of correspondences.
(2) Derive the epipoles e and e0using FT
e0= 0 and
Fe = 0.
(3) Derive the homography Hp of the reference
plane with an epipole and 3 pairs of point correspondences.
(4) Choose a pair of correspondence m0 and m00
where m0and m00 are image points on the left
image and the right image, respectively. (5) Scale Hpsuch that m00ffi Hpm0þ e0ðk0¼ 1Þ.
(6) Obtain kiwith m0iffi Hpmiþ kie0, 1 6 i 6 n 1.
Since it is view-invariant, k can be used as a use-ful feature to describe object points. Algorithm 1 summarizes the process to calculate the relative af-fine structure for n pairs of image points. By calcu-lating relative affine structures of facial features of persons, we have developed an identity verification system based on face recognition using k, as dis-cussed next.
4. Face recognition using relative affine structures With the properties of the view-invariant rela-tive affine structure investigated in the previous section, this section presents the proposed
ap-proach to face recognition using such invariants. Recall that the relative affine structure of an object point is only dependent on the configuration of the first camera C, the position of the reference plane p and the reference point M0. So, two facial images
are used first to derive the relative affine structure for each feature point. The first image is denoted as the reference image and the extracted facial fea-tures are stored together with the obtained relative affine structures. To verify the identity of a new facial image, a new set of relative affine structures are obtained by the reference facial image and the new image. The similarity between the stored rela-tive affine structures and the new set of relarela-tive affine structures is evaluated. Finally, the identity is verified by checking whether the similarity is higher than some specified thresholds.
The extraction of essential features of two facial images and the procedure adopted in this paper to obtain relative affine structures using the extracted features are explained here. To focus on the cor-rectness of the theory, feature points are obtained manually from facial images taken from different points of view. On each given face image, fifteen feature points including eye and mouth corners, nose tip, ear lobes, etc. are extracted as shown in
Fig. 4(b). The image of the front view of person A is labeled as Afwhile the upward and downward
looking facial images are labeled as Au and Ad,
respectively. In the same manner, three images of each of other persons are also taken. For example,Fig. 5shows the images obtained for per-son B.
Table 1 shows the relative affine structures ob-tained for persons A and B with Auand Bubeing
the reference images, respectively. Since the refer-ence plane is defined by right ear lobe (point 14), right ear lobe (point 13) and chin (point 15), as illustrated in Fig. 6, the relative affine structure values of these three points are all zeros. The value of the relative affine structure of the nose tip (point 12), which is the reference point M0, is defined as 1
for normalization. Since the depth from the camera to a person is usually several meters,z0
z in (17)is close to 1. Thus, the values of other relative affine structures given in(17)are close to d
d0. From
Fig. 6, we can see that the ratio for the eye corner is close to unity, the ratio for the mouth corner is
about 0.4, while the ratios for the upper and lower lips are about 0.65 and 0.45, respectively.
In our experiments, we use six groups of facial images for persons A through F (seeFig. 7for fa-cial images Cfthrough Ff). Each group consists of
three images from three different points of view. With a personal computer equipped with a 333
MHz PentiumII processor and memory of 128 MB, the program implemented with MATLAB 6.1 under Microsoft Windows 2000 spends 0.1 sec-ond to obtain the relative affine structure for each data set, e.g., Au_Af with Au being the reference
image. A database is used to store such informa-tion obtained from the facial images. The details
Fig. 4. Face images of person A. From left to right side, the images are labeled as Au, Af, and Ad, respectively.
Fig. 5. Face images of person B. From left to right side, the images are labeled as Bu, Bf, and Bd, respectively.
Table 1
Relative affine structures obtained for persons A (k1i) and B (k2i) using Au_Afand Bu_Bf, respectively
i Feature point k1i k2i
1 Right eye corner (outer) 0.9951 1.0111
2 Right eye corner (inner) 0.9050 1.0391
3 Left eye corner (inner) 0.8112 1.0400
4 Left eye corner (outer) 0.7242 1.0590
5 Mouth corner (right) 0.4358 0.4594
6 Mouth corner (left) 0.4228 0.3430
7 Upper lip 0.6663 0.6598 8 Lower lip 0.4748 0.4518 9 Nose (right) 0.7256 0.7436 10 Nose (left) 0.6808 0.7281 11 Nose (center) 0.7734 0.8849 12 Nose (tip) 1.0000 1.0000
13 Ear lobe (right) 0.0000 0.0000
14 Ear lobe (left) 0.0000 0.0000
of the verification stage using this database and a method to improve the performance of the verifi-cation are given in the next section.
5. Experimental results
This section gives the experimental results of face recognition. For example, given the relative affine structures previously stored in the database for Xu_Xfand a facial image of an unknown
per-son Y, we can investigate the identity of Y by
eval-uating the similarity between the relative affine structures for Xu_Xfand that for Xu_Y. The result
of the comparison is then transformed into a score of matching error. If the score exceeds a threshold, the unknown person Y is identified not being the person X.
Table 2 shows the relative affine structure val-ues for the fifteen facial features calculated for Au_Afand Au_Ad. Here, the dissimilarity between
two corresponding relative affine structures, say k1i and k2i, is calculated as Dsi= max(k1i/k2i, k2i/
k1i). For feature points lie on the reference plane,
the relative affine structures are 0Õs by definition and the dissimilarity values are set to 1 (not shown). Eventually, the overall dissimilarity be-tween these two set of relative affine structures are defined as the product of all DsiÕs. For this
example, person with facial image Adwill be
iden-tified as person A since the overall dissimilarity, denoted as Ds_Au_Af_Ad, is very close to 1.
Table 3 gives results similar to that in Table 2
but using facial image Bd of person B instead
of Ad. It is readily observable that there are major
differences between quite a few corresponding relative affine structure pairs. In particular, if ki1* ki2< 0, that means the feature points are not
on the same side of the reference plane in the 3D
Fig. 6. Face image of side view of person F. The reference plane is defined by the two ear lobes and the chin. The 2D projections on images of these three feature points are used to calculate relative affine structures.
space, the dissimilarity value are set to 2 which leads to a big contribution to the overall dissimi-larity. Since the overall dissimilarity of this exam-ple exceeds the threshold, person B is not identified as person A.
To further improve the stability of the verifica-tion system, each facial image can be used as the reference image and a composite dissimilarity mea-sure can be obtained, which is the geometric mean of individual results. Table 4 shows the result of
the verification of Afusing Auand AdwhileTable 5 shows similar results by using Bfinstead of Af.
Table 2
Relative affine structures for Au_Af(k1i) and Au_Ad(k2i), and their dissimilarity Dsi= max(k1i/k2i, k2i/k1i)
i Feature point k1i k2i Dsi
1 Right eye corner (outer) 0.9951 0.9510 1.0463
2 Right eye corner (inner) 0.9050 0.8961 1.0100
3 Left eye corner (inner) 0.8112 0.8183 1.0087
4 Left eye corner (outer) 0.7242 0.7189 1.0073
5 Mouth corner (right) 0.4358 0.4271 1.0204
6 Mouth corner (left) 0.4228 0.4409 1.0428
7 Upper lip 0.6663 0.6719 1.0084 8 Lower lip 0.4748 0.4871 1.0259 9 Nose (right) 0.7256 0.7243 1.0018 10 Nose (left) 0.6808 0.6853 1.0066 11 Nose (center) 0.7734 0.7593 1.0185 12 Nose (tip) 1.0000 1.0000 1.0000
13 Ear lobe (right) 0.0000 0.0000 1.0000
14 Ear lobe (left) 0.0000 0.0000 1.0000
15 Chin 0.0000 0.0000 1.0000
Overall dissimilarity 1.2141
Table 3
Relative affine structures for Au_Af(k1i) and Au_Bd(k2i), and their dissimilarity Dsi= max(k1i/k2i, k2i/k1i)
i Feature point k1i k2i Dsi
1 Right eye corner (outer) 0.9951 1.0243 1.0293
2 Right eye corner (inner) 0.9050 2.3284 2.5727
3 Left eye corner (inner) 0.8112 3.5765 4.4088
4 Left eye corner (outer) 0.7242 4.5082 6.2254
5 Mouth corner (right) 0.4358 43.423 99.632
6 Mouth corner (left) 0.4228 2.701 2.0000
7 Upper lip 0.6663 3.049 2.0000 8 Lower lip 0.4748 6.584 2.0000 9 Nose (right) 0.7256 2.721 2.0000 10 Nose (left) 0.6808 0.186 2.0000 11 Nose (center) 0.7734 0.711 2.0000 12 Nose (tip) 1.0000 0.9999 1.0000
13 Ear lobe (right) 0.0000 0.0000 1.0000
14 Ear lobe (left) 0.0000 0.0000 1.0000
15 Chin 0.0000 0.0000 1.0000
Overall dissimilarity 4.63E+05
Table 4
Verification of Afusing Auand Ad
Overall dissimilarity
Ds_Au_Af_Ad 1.2141
Ds_Af_Au_Ad 1.9270
Ds_Ad_Au_Af 1.6926
The composite dissimilarity 1.5821 inTable 4 indi-cates that the facial image Afcan be verified to be
of person A. On the other hand, it is obvious that Bfis not a facial image of person A since the
com-posite dissimilarity inTable 5is too high.
By using the composite dissimilarity measure, a more robust identity verification system is
devel-oped and more experimental results are obtained.
Table 6 shows the composite dissimilarity for the verifications of facial images Au through Fu
based on relative affine structure established using front and downward looking facial images. Similarly,Table 7verifies facial images Afthrough
Ff and Table 8 verifies facial images Ad through
Fd, respectively. It can be seen from these results
that the threshold for similarity can be set comfortably at 2.5 for the composite dissimilar-ity that every person in our database can be correctly verified with the proposed approach. We can see easily that the developed identity verification system successfully performs the verifi-cation of our experimental database of facial images.
Table 5
Verification of Bfusing Auand Ad
Overall dissimilarity Ds_Au_Bf_Ad 499057.33 Ds_Bf_Au_Ad 631.10 Ds_Ad_Au_Bf 1955.54 Composite dissimilarity 8508.22 Table 6
Composite dissimilarities for the verification of facial images Authrough Fu
Af, Ad Bf, Bd Cf, Cd Df, Dd Ef, Ed Ff, Fd
Au 1.58 22925.14 1509.41 16456.43 110.70 1995.84
Bu 95.19 1.87 439.52 1.48E+05 1.50E+06 1.23E+05
Cu 12.17 211.89 1.94 1.22E+05 351.99 86389.45
Du 4.08E+05 122.75 3055.30 1.61 50602.31 10.73
Eu 7.85E+06 2861.12 38943.89 62857.37 1.70 810.17
Fu 2.89E+05 2348.07 206.42 32115.82 1.29E+07 1.89
Table 7
Composite dissimilarities for verification of facial images Afthrough Ff
Au, Ad Bu, Bd Cu, Cd Du, Dd Eu, Ed Fu, Fd Af 1.58 13738.93 9625.68 26941.43 8.06E+06 1.02E+06 Bf 8508.22 1.87 372.59 18755.60 1.15E+06 8985.88 Cf 11998.25 14725.11 1.94 2791.98 2013.52 5650.15 Df 3042.51 310.53 51170.67 1.61 425738 195.46 Ef 6979.54 1.90E+06 171595 3329.84 1.70 105005 Ff 151.60 2.03E+06 1146.79 1262.61 134511 1.89 Table 8
Composite dissimilarity for verification of facial images Adthrough Fd
Au, Af Bu, Bf Cu, Cf Du, Df Eu, Ef Fu, Ff
Ad 1.58 149.81 11.38 2654.73 1336.72 162.89
Bd 705.13 1.87 422.78 3.18 4536.51 16885.11
Cd 10160.21 24.72 1.94 34498.32 6919.54 762.24
Dd 8318.70 1.09E+07 5.61E+05 1.61 1.75E+05 20261.21
Ed 14.15 3155.22 24.01 781.89 1.70 423.75
As for the sensitivity of the proposed algorithm, the relative affine structure is actually cross ratio in a form which is quite stable numerically.1 This can be seen from Fig. 3 that the error of feature detection, in terms of variance of image pixels on the image plane, will results in minor change in the depth of the spatial structure, e.g., z and z0,
associated with a face. From above simulation results, it seems that differences among face structures of different individuals are much more significant than the differences due to the error of feature detection of facial images of the same per-son, which gives the robustness of the proposed approach.
6. Conclusion
This paper presents a study on computer vision technique and its application in face recognition to achieve identity verification. The explicit relation-ship between the relative affine structure and the cross ratio—an invariant under perspective projec-tion, is addressed. Subsequently, relative affine structures derived from multiple images are used for face recognition. The proposed method neither requires camera calibration nor reconstructs 3D models. Moreover, as long as feature points of fa-cial images are located accurately, the orientation and depth of the face are allowed to very more freely. As shown in our preliminary experiments, the proposed approach does achieve satisfactory results given the feature points of facial images. Slightly large scale of face database can be estab-lished for further investigation of the performance.
References
Atick, J.J., Griffin, P.A., Redlich, N.A., 1995. Face recognition from live video. Advance Imaging 10 (5), 58–62.
Belhhumeur, P., Hespanha, J., Kriegman, D., 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear pro-jection. IEEE Trans. Pattern Anal. Mach. Intell. 19 (7), 711–720.
Bichsel, M., Pentland, A.P., 1994. Human face recognition and the face image setÕs topology. CVGIP: Image Understand-ing 59 (2).
Chellappa, R., Wilson, C., Sirohey, S., 1995. Human and machine recognition of faces: A survey. Proc. IEEE 83 (5). Colombe, J.B., 2003. A survey of recent developments in theoretical neuroscience and machine vision. In: Proc. 32nd Applied Imagery Pattern Recognition Workshop, pp. 205– 213.
Eriksson, A., Weber, D., 1999. Towards 3-dimensional face recognition, vol. 1. In: Proc. IEEE Conf. on Africon, pp. 401–406.
Hotta, K., 2003. View-invariant face detection method based on local pca cells. In: Proc. 12th Internat. Conf. on Image Analysis and Processing, pp. 57–62.
Jeng, S.H., Liao, H.Y.M., Han, C.C., Chern, M.Y., Liu, Y.T., 1998. Facial feature detection using geometrical face model: An efficient approach. Pattern Recognition 31 (3), 273– 282.
Kanade, T., 1974. Picture processing system by computer complex and recognition of human faces. Ph.D. disserta-tion, Robotics Institute, Carnegie Mellon University. Kriegman, D.J., Yang, M.H., Ahuja, N., 2002. Detecting faces
in images: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 24 (1), 34–58.
Lades, M., Vorbruggen, J.C., Buhmann, J., Lage, J., von der Malsburg, C., Wurtz, R.P., Konen, W., 1994. Distortion invariant object recognition in the dynamic link architec-ture. IEEE Trans. Comput. 42.
Lengagne, R., Tarel, J.P., Monga, O., 1996. From 2D images to 3D face geometry. In: Proc. IEEE Conf. on Automatic Face and Gesture Recognition, pp. 301–306.
Liao, H.Y.M., Han, C.C., Yu, G.J., Tyan, H.R., Chen, M.C., Chen, L.H., 1988. Face recognition using a face-only database: A new approach. In: Proc. Asian Conf. on Computer Vision, p. 1352.
Lin, S.H., Kung, S.Y., Lin, L.J., 1997. Face recognition/ detection by probabilistic decision-based neural network. IEEE Trans. Neural Networks 8 (1).
Liu, J.S., Chuang, J.H., 2002. A geometry-based error estima-tion of cross-ratios. Pattern Recogniestima-tion 35 (12), 155– 167.
Mirhosseini, A.R., Yan, H., 1998. Human face image recogni-tion: An evidence aggregation approach. Computer Vision and Image Understanding 71 (2), 213–230.
Pentland, A., Turk, M., 1991. Face recognition using eigen-faces. In: Proc. CVPR.
Phillips, P.J., 1998. Matching pursuit filters applied to face recognition. IEEE Trans. Image Process. 7 (8), 1150– 1164.
Rolls, E.T., Stringer, S.M., 2001. Invariant object recognition in the visual system with error correction and temporal difference learning. Network: Comput. Neural Syst. 12, 111–129.
Rowley, H.A., Baluja, S., Kanade, T., 1998. Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20 (1).
1 Please see Liu and Chuang (2002) for a comprehensive
Samal, A., Iyengar, P., 1992. Automatic recognition and analysis of human faces and facial expressions: A survey. Pattern Recognition 25, 65–77.
Shashua, A., Navab, N., 1996. Relative affine structure: Canon-ical model for 3D from 2D geometry and applications. IEEE Trans. Pattern Anal. Mach. Intell. 18 (9), 873–883. Turk, M., Pentland, A., 1991. Eigenfaces for recognition. J.
Cognitive Neurosci. 3 (1).
Ya, Y., Zhang, J., 1998. Rotation-invariant 3D reconstruction for face recognition. In: Proc. IEEE Conf. on Image Processing, vol. 1, pp. 156–160.
Yang, G., Huang, T.S., 1994. Human face detection in a complex background. Pattern Recognition 27 (1), 53–63. Zhang, J., Yan, Y., Lades, M., 1997. Face recognition:
Eigenface, elastic matching and neural nets. Proc. IEEE 85 (9).