Identity verification by relative 3-D structure using multiple facial images

(1)

Identity veriﬁcation by relative 3-D structure using

multiple facial images

q

Jau Hong Kao

*

_{, Yen Heng Chen, Jen Hui Chuang}

Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan, Province of China Received 3 December 2003; received in revised form 27 October 2004

Available online 15 December 2004

Abstract

Identity verification is one of the critical issues in the sector of security and has been emerging as an active research area. In recent years, technologies using biological features to address problems of identity verification have attracted numerous research interests. For examples, fingerprint recognition, voice recognition and pattern of blood vessels in the retina have spanned many commercial applications. However, special and expensive equipments such as fingerprint readers and iris scanners are often required and people have to be in unpleasant poses occasionally. This paper presents a study on computer vision technique and its application in face recognition to achieve identity verification. With multi-ple facial images taken from different view angles, relative affine structures are computed and are used as measurements. To that end, the explicit relationship between relative affine structure and the cross ratio which is a view-invariant under perspective projection is also addressed. The proposed method neither requires camera calibration nor reconstructs 3D models. According to simulation results, the developed approach can achieve satisfactory results given the feature points of facial images.

Keywords: Identity veriﬁcation; Relative aﬃne structure; Cross ratio; Perspective projection

1. Introduction

Machine recognition of faces has been a very active research topic in recent years (Belhhumeur et al., 1997; Chellappa et al., 1995; Samal and Iyengar, 1992; Zhang et al., 1997). Face recogni-tion technology for still and video images has potentially numerous commercial and law enforce-ment applications. These applications range from

q

This work is partly supported by National Science Council of Taiwan, Republic of China, under grant no. NSC92-2213E009006 and by Ministry of Economic Aﬀairs, Taiwan, Republic of China, under grant no. g3-EC17A02S1032.

*

Corresponding author.

E-mail addresses: gis88804@cis.nctu.edu.tw (J.H. Kao),

jchuang@cis.nctu.edu.tw(J.H. Chuang).

(2)

static matching of well-formatted photographs such as passports, credit cards, driverÕs licenses, and mug shots, to real-time matching of surveil-lance video images presenting different constraints in terms of various processing requirements. Although humans seem to recognize faces in clut-tered scene with relative ease, machine recognition which often spans several disciplines such as image processing, pattern recognition, computer vision, and neural networks is a much more daunting task. In particular, the problem can be formulated as follows: Given still or video images of a scene, identify one or more persons in the scene using a stored database of faces. A complete face recogni-tion system generally includes two main stages. The first stage is the face detection stage that deter-mines the existence of one or more faces in an image. Techniques used in this stage involve seg-mentation of faces from cluttered scenes and extraction of features from the face region. The challenges are mainly due to the fact that the posi-tion, orientation and size of face regions in an arbi-trary image are usually unknown (Rowley et al., 1998; Yang and Huang, 1994; Jeng et al., 1998). A survey of face detection techniques can be found in (Kriegman et al., 2002). The second stage is the recognition stage which deals with the identifica-tion and matching problems. The goal is to deter-mine the identities of the target faces obtained in the first stage. Considering important works devel-oped so far in the recognition stage in the engineer-ing literature, a brief survey on the face recognition researches in recent years is provided in what follows.

Most of existing face recognition algorithms are 2D-based. In terms of the nature of the facial fea-tures utilized, these 2D algorithms can generally be divided into two major categories: structure-based approaches and statistics-based approaches. The class of structure-based ones uses structural facial features, which are mostly local structures, e.g., the shapes of mouth, nose, and eyes (Mirhosseini and Yan, 1998; Lades et al., 1994; Kanade, 1974; Phil-lips, 1998). In (Kanade, 1974), an automated rec-ognition system that uses a top-down control strategy directed by a generic model of expected feature characteristics is developed. They proposed an elastic graph matching model which extracts

the feature vectors from image lattices based on a set of 2D Gabor ﬁlters. The main advantage of a structure-based face recognition method is the low sensibility to irrelevant data, e.g., moving hair or background, since it only handles data of inter-est instead of using all image data indiscriminately. The main disadvantage of such approaches is the high complexity in feature extraction.

The statistics-based approaches basically use the whole 2D image as facial features (Belhhumeur et al., 1997; Bichsel and Pentland, 1994; Lin et al., 1997; Liao et al., 1988). In this category of ap-proaches, the principal component analysis (PCA) exhibits particular importance (Hotta, 2003). The principal components, e.g., Eigenface (Turk and Pentland, 1991; Pentland and Turk, 1991), of training face images are calculated and then used as a set of orthonormal basis. The com-plete space can be represented effectively by a sig-nificant small subset of these orthonormal facial images and the dimension of the feature space of facial images is thus reduced. Moreover, theoreti-cal neuroscience has contributed to account for the view-invariance perception, which is also the underlying idea of our work for identify verifica-tion, of universals such as the explicit perception of featural parts and wholes in visual scenes. A survey of recent developments in theoretical neuroscience for machine vision can be found in (Colombe, 2003). These unsupervised learning methods are used to make predictive perceptual models of the spatial and temporal statistical structure in natural visual scenes. In particular, given the spatio-temporal continuity of the statis-tics of sensory input, invariant object recognition might be implemented using a learning rule that uses a trace of previous neural activity capturing the same object under different transforms in the short time scale. By first relating a modified Heb-bian rule to error correction rules and exploring a number of error correction rules that can be ap-plied to invariant pattern recognition, Rolls and Stringer (2001) developed learning rules related to temporal difference learning. The analysis of temporal difference learning provides a theoretical framework for better understanding the operation and convergence properties of rules useful for learning invariant representations. In contrast to

(3)

structure-based approaches, statistics-based ones are more straightforward and simple. However, it happens that important local features are used with small factor of importance. As for theoretical neuroscience, it is not yet obvious whether the full power of learning rules is expressed in the brain, and the practical applications in face recognition are needed for the understanding of the perfor-mance. The work in (Rolls and Stringer, 2001) provides suggestions about how they might be implemented. Although the above 2D-based face recognition approaches produce satisfactory re-sults under normal conditions, their performance can deteriorate quickly by varying lighting condi-tion or large change of the viewing geometry.

As the face recognition technology is an essen-tial tool for law enforcement agenciesÕ eﬀorts to combat crime, fake or duplicated facial images which can easily cheat the 2D-based facial recogni-tion systems raise problems of interest (Chellappa et al., 1995). To avoid such problems, a few 3D model-based face recognition are proposed wherein 3D feature points are reconstructed which provide important information for facial recogni-tion. In (Atick et al., 1995) a method based on Karhonen-Loeve expansion is developed to recon-struct 3D face features. The method is claimed to be independent on lighting conditions. In (Ya and Zhang, 1998), the reconstruction of face surface is made rotation-invariant. A similar ap-proach based on a depth map obtained from stereo images to perform face segmentation and recogni-tion can be found in (Lengagne et al., 1996). In (Eriksson and Weber, 1999), a model-matching approach is provided to reduce the computational cost of 3D-based facial recognition algorithms.

In this paper, we propose a novel approach to identify a person with facial images using 3D information of facial feature points. Three refer-ence points are first extracted to construct a reference plane in every image. By calculating a view-invariant relative depth, i.e., relative affine structure with respect to the obtained reference plane introduced in (Shashua and Navab, 1996), for each relevant feature point, an efficient face recognition algorithm is developed using the robust measurement. Compared with other 3D ap-proaches that require specific structures in

Euclid-ean space (Atick et al., 1995; Ya and Zhang, 1998), the proposed method uses only a few facial feature points and requires no camera calibration. In addi-tion, iterative training is not required which leads to the issue of convergence in the neural network approaches. Experimental results show that the developed approach performs satisfactorily with an experimental facial image database.

In the following sections, we first introduce re-lated projection geometry for one and two cam-eras. The geometrical relationships between two cameras such as parallax and relative affine struc-ture are discussed in Section 3, together with the geometrical meaning of such a structure which is expressed in terms of the invariant under perspec-tive projection, i.e., cross ratio. Algorithms for face recognition using relative affine structure are presented in Section 4. Simulation results for an experimental facial image database are given in Section 5. Finally, conclusion is given in Section 6.

2. Projective geometry for one and two cameras The basic procedure of projecting 3D points onto an image by a perspective camera can be de-scribed as

m/ PM; ð1Þ

where/ denotes the equality up to a scaling fac-tor, P is the 3· 4 projection matrix, M ¼ ½X Y Z 1T and m¼ ½x y 1Trepresent the homo-geneous coordinates of a 3D world point and the corresponding image point, respectively. In gen-eral, the image coordinate system is deﬁned in terms of image pixels. The general form of the pro-jection matrix can be represented as

Peuc/ KP0T ¼ fx s px 0 fy py 0 0 1 2 6 4 3 7 5½Ij0 ₀RT t 3 1 : ð2Þ

In(2), K gives the intrinsic parameters of the cam-era, the imaging system. As for T, it describes the location and orientation of the camera with re-spect to the world coordinate system. It is a 4· 4 matrix describing the pose of the camera in terms of a rotation R and a translation t, which give

(4)

the extrinsic parameters. For an ideal camera model, both K and T are identity matrices and

(2)becomes

m , P0M : ð3Þ

Consider two cameras taking pictures of an object, as illustrated in Fig. 1, wherein C and C0 _{are the}

two optical centers of the two cameras and v and v0 _{are their associated image planes, respectively.}

The projection of C0_{on v, e = PC}0_{, observed from}

C and the projection of C on v0_{observed from C}0_,

e0_{= P}0_{C, are deﬁned as the epipoles of the two}

cameras, respectively. Without loss of generality, we assume that the world coordinate system is aligned with the image coordinate system of cam-era C, thus the projection matrices for C and C0

become

P¼ K33½I33j0 ¼ ½Kj0; ð4Þ

P0¼ K0

33½R33jt31 ¼ ½K0RjK0t: ð5Þ

In addition, we have, by deﬁnition, PC¼ K33½I33j031C41¼ 0

or

C/ 0½ 0 0 1T:

Since e0_{is the projection of C on v}0

e0¼ P0C¼ K0t: ð6Þ Consider a 3D point M whose depth is z with re-spect to the camera coordinate system of camera

C. Its projection on the image plane v, from (3), is equal to m/ PM ¼ K ~M with M ¼ Me 1 " # ¼ zK 1_m 1 " #

if m is normalized as (x, y, 1)T. The projection on image plane v0 _{is then}

m0/ P0M / K0RK1mþ1 ZK

0_t: _ð7Þ

With the above geometrical relationships and coordinate transformations between two cameras,

Shashua and Navab (1996) derived the view invariant relative aﬃne structure. The following section provides a brief review, together with its explicit geometric meaning.

3. Relative aﬃne structure and its geometric meaning

In (Shashua and Navab, 1996), an affine frame-work for perspective views is proposed which is captured by a simple equation based on an invari-ant called relative affine structure. It is shown in (Shashua and Navab, 1996) that the framework unifies projection tasks including Euclidean, pro-jective and affine in a natural and simple way. While the algebraic form of the relative affine structure is given clearly in (Shashua and Navab, 1996), as reviewed next, the direct relationship between the relative affine structure and a view-invariant cross ratio under perspective projection, is derived at the end of this section.

Given a reference plane p where the image points m and m0 _{are projections of a 3D point}

Mp2 p on image planes v and v0, respectively.

The homography induced by p can be obtained by Mp= H1m and Mp= H2m0as follows:

m0¼ H12 Mp¼ H12 H1m¼ Hpm: ð8Þ

Since Hphas eight entries (nine minus a scale

fac-tor), Hp can be determined uniquely by solving a

system of linear equations obtained from three

(5)

point correspondences in general positions on p and the relationship e0_{= H}

pe. Moreover, once Hp

is computed we can use it to determine positions of points on p from a singe image.

The homogeneous coordinates of p can be writ-ten as

p¼ n31 dp

; ð9Þ

where n and dpdescribe the normal vector and the

depth of p, respectively. For the projection m of Mpon the image plane v, we have

m¼ PMp ¼ ½Kj0Mp:

Since the depth of Mpis unknown, we can assume

that Mp¼ ðK1_mÞ 31 q " # : ð10Þ

On the other hand, since Mpis on p, we have

q¼1 dp

nTK1m: ð11Þ

Now, by projecting Mpon v0, we have

m0¼ Hpm/ P0Mp¼ K0 R

tnT

dp

K1m: ð12Þ For more general scenes wherein not all of the 3D points are co-planar, parallax will be produced. For instance, M is a 3D point which is not on the plane p in Fig. 2. m00 _{and H}

pm are projections

of M and Mpon v0, respectively. From(7), (8), (12)

and e0_{= K}0_{t, we have} m00/ K0_RK1_m_þ1 zK 0_t ¼ Hpmþ znT_K1_m_{þ d} p dpz e0: ð13Þ For a point M ¼ ½zK1_m ₁T

which is not on the reference plane p, the distance from M to p is equal to

d ¼ pT_M _{¼ zn}T_K1_m_{þ d}

p: ð14Þ

Substituting(14)into(13), we have m00/ Hpmþ

d dpz

e0¼ Hpmþ be0: ð15Þ

Since the value of the parallax term b in (15) is normalized, dp can be dropped out, as stated in

(Shashua and Navab, 1996). If we let b0= 1 for a

reference point M0 which is not on the reference

plane (seeFig. 3), we are left with dp¼

d0

z0

and(15)can be rewritten as m00ﬃ Hpmþ z0 z d d0 e0¼ Hpmþ ke0 ð16Þ

with k being the relative aﬃne structure. In the fol-lowing paragraph, we will investigate a diﬀerent of

Fig. 2. An example of parallax. M is a point which is not on the reference plane p.

Fig. 3. The geometry of the relative aﬃne structure. z and z0

(6)

relative aﬃne structure and its relationship with cross ratio.

In (16), it is not diﬃcult to see that k is an invariant quantity since the variables z0, z, d0, d

are governed by camera C only. Consider Fig. 3, by extending MM0, we can obtain two intersection

points m and Mp, which are on v and the reference

plane p, respectively. By triangular similarity, we have k¼z0 z d d0 ¼mM0 mM MMp M0Mp ¼ CRðm; M; M0; MpÞ: ð17Þ

This leads to a conclusion that relative aﬃne struc-ture is in fact a measure of cross ratio.

Algorithm 1. Computation of relative aﬃne struc-ture for n pairs of image points

(1) Calculate the fundamental matrix F with 8 pairs of correspondences.

(2) Derive the epipoles e and e0_{using F}T

e0_{= 0 and}

Fe = 0.

(3) Derive the homography Hp of the reference

plane with an epipole and 3 pairs of point correspondences.

(4) Choose a pair of correspondence m0 and m00

where m0and m00 are image points on the left

image and the right image, respectively. (5) Scale Hpsuch that m00ﬃ Hpm0þ e0ðk0¼ 1Þ.

(6) Obtain kiwith m0iﬃ Hpmiþ kie0, 1 6 i 6 n 1.

Since it is view-invariant, k can be used as a use-ful feature to describe object points. Algorithm 1 summarizes the process to calculate the relative af-fine structure for n pairs of image points. By calcu-lating relative affine structures of facial features of persons, we have developed an identity verification system based on face recognition using k, as dis-cussed next.

4. Face recognition using relative aﬃne structures With the properties of the view-invariant rela-tive aﬃne structure investigated in the previous section, this section presents the proposed

ap-proach to face recognition using such invariants. Recall that the relative affine structure of an object point is only dependent on the configuration of the first camera C, the position of the reference plane p and the reference point M0. So, two facial images

are used first to derive the relative affine structure for each feature point. The first image is denoted as the reference image and the extracted facial fea-tures are stored together with the obtained relative affine structures. To verify the identity of a new facial image, a new set of relative affine structures are obtained by the reference facial image and the new image. The similarity between the stored rela-tive affine structures and the new set of relarela-tive affine structures is evaluated. Finally, the identity is verified by checking whether the similarity is higher than some specified thresholds.

The extraction of essential features of two facial images and the procedure adopted in this paper to obtain relative affine structures using the extracted features are explained here. To focus on the cor-rectness of the theory, feature points are obtained manually from facial images taken from different points of view. On each given face image, fifteen feature points including eye and mouth corners, nose tip, ear lobes, etc. are extracted as shown in

Fig. 4(b). The image of the front view of person A is labeled as Afwhile the upward and downward

looking facial images are labeled as Au and Ad,

respectively. In the same manner, three images of each of other persons are also taken. For example,Fig. 5shows the images obtained for per-son B.

Table 1 shows the relative aﬃne structures ob-tained for persons A and B with Auand Bubeing

the reference images, respectively. Since the refer-ence plane is defined by right ear lobe (point 14), right ear lobe (point 13) and chin (point 15), as illustrated in Fig. 6, the relative affine structure values of these three points are all zeros. The value of the relative affine structure of the nose tip (point 12), which is the reference point M0, is defined as 1

for normalization. Since the depth from the camera to a person is usually several meters,z0

z in (17)is close to 1. Thus, the values of other relative aﬃne structures given in(17)are close to d

d0. From

Fig. 6, we can see that the ratio for the eye corner is close to unity, the ratio for the mouth corner is

(7)

about 0.4, while the ratios for the upper and lower lips are about 0.65 and 0.45, respectively.

In our experiments, we use six groups of facial images for persons A through F (seeFig. 7for fa-cial images Cfthrough Ff). Each group consists of

three images from three diﬀerent points of view. With a personal computer equipped with a 333

MHz PentiumII processor and memory of 128 MB, the program implemented with MATLAB 6.1 under Microsoft Windows 2000 spends 0.1 sec-ond to obtain the relative aﬃne structure for each data set, e.g., Au_Af with Au being the reference

image. A database is used to store such informa-tion obtained from the facial images. The details

Fig. 4. Face images of person A. From left to right side, the images are labeled as Au, Af, and Ad, respectively.

Fig. 5. Face images of person B. From left to right side, the images are labeled as Bu, Bf, and Bd, respectively.

Table 1

Relative aﬃne structures obtained for persons A (k1i) and B (k2i) using Au_Afand Bu_Bf, respectively

i Feature point k1i k2i

1 Right eye corner (outer) 0.9951 1.0111

2 Right eye corner (inner) 0.9050 1.0391

3 Left eye corner (inner) 0.8112 1.0400

4 Left eye corner (outer) 0.7242 1.0590

5 Mouth corner (right) 0.4358 0.4594

6 Mouth corner (left) 0.4228 0.3430

7 Upper lip 0.6663 0.6598 8 Lower lip 0.4748 0.4518 9 Nose (right) 0.7256 0.7436 10 Nose (left) 0.6808 0.7281 11 Nose (center) 0.7734 0.8849 12 Nose (tip) 1.0000 1.0000

13 Ear lobe (right) 0.0000 0.0000

14 Ear lobe (left) 0.0000 0.0000

(8)

of the veriﬁcation stage using this database and a method to improve the performance of the veriﬁ-cation are given in the next section.

5. Experimental results

This section gives the experimental results of face recognition. For example, given the relative aﬃne structures previously stored in the database for Xu_Xfand a facial image of an unknown

per-son Y, we can investigate the identity of Y by

eval-uating the similarity between the relative aﬃne structures for Xu_Xfand that for Xu_Y. The result

of the comparison is then transformed into a score of matching error. If the score exceeds a threshold, the unknown person Y is identiﬁed not being the person X.

Table 2 shows the relative aﬃne structure val-ues for the ﬁfteen facial features calculated for Au_Afand Au_Ad. Here, the dissimilarity between

two corresponding relative aﬃne structures, say k1i and k2i, is calculated as Dsi= max(k1i/k2i, k2i/

k1i). For feature points lie on the reference plane,

the relative affine structures are 0Õs by definition and the dissimilarity values are set to 1 (not shown). Eventually, the overall dissimilarity be-tween these two set of relative affine structures are defined as the product of all DsiÕs. For this

example, person with facial image Adwill be

iden-tiﬁed as person A since the overall dissimilarity, denoted as Ds_Au_Af_Ad, is very close to 1.

Table 3 gives results similar to that in Table 2

but using facial image Bd of person B instead

of Ad. It is readily observable that there are major

diﬀerences between quite a few corresponding relative aﬃne structure pairs. In particular, if ki1* ki2< 0, that means the feature points are not

on the same side of the reference plane in the 3D

Fig. 6. Face image of side view of person F. The reference plane is deﬁned by the two ear lobes and the chin. The 2D projections on images of these three feature points are used to calculate relative aﬃne structures.

(9)

space, the dissimilarity value are set to 2 which leads to a big contribution to the overall dissimi-larity. Since the overall dissimilarity of this exam-ple exceeds the threshold, person B is not identiﬁed as person A.

To further improve the stability of the veriﬁca-tion system, each facial image can be used as the reference image and a composite dissimilarity mea-sure can be obtained, which is the geometric mean of individual results. Table 4 shows the result of

the veriﬁcation of Afusing Auand AdwhileTable 5 shows similar results by using Bfinstead of Af.

Table 2

Relative aﬃne structures for Au_Af(k1i) and Au_Ad(k2i), and their dissimilarity Dsi= max(k1i/k2i, k2i/k1i)

i Feature point k1i k2i Dsi

1 Right eye corner (outer) 0.9951 0.9510 1.0463

2 Right eye corner (inner) 0.9050 0.8961 1.0100

3 Left eye corner (inner) 0.8112 0.8183 1.0087

4 Left eye corner (outer) 0.7242 0.7189 1.0073

5 Mouth corner (right) 0.4358 0.4271 1.0204

6 Mouth corner (left) 0.4228 0.4409 1.0428

7 Upper lip 0.6663 0.6719 1.0084 8 Lower lip 0.4748 0.4871 1.0259 9 Nose (right) 0.7256 0.7243 1.0018 10 Nose (left) 0.6808 0.6853 1.0066 11 Nose (center) 0.7734 0.7593 1.0185 12 Nose (tip) 1.0000 1.0000 1.0000

13 Ear lobe (right) 0.0000 0.0000 1.0000

14 Ear lobe (left) 0.0000 0.0000 1.0000

15 Chin 0.0000 0.0000 1.0000

Overall dissimilarity 1.2141

Table 3

Relative aﬃne structures for Au_Af(k1i) and Au_Bd(k2i), and their dissimilarity Dsi= max(k1i/k2i, k2i/k1i)

i Feature point k1i k2i Dsi

1 Right eye corner (outer) 0.9951 1.0243 1.0293

2 Right eye corner (inner) 0.9050 2.3284 2.5727

3 Left eye corner (inner) 0.8112 3.5765 4.4088

4 Left eye corner (outer) 0.7242 4.5082 6.2254

5 Mouth corner (right) 0.4358 43.423 99.632

6 Mouth corner (left) 0.4228 2.701 2.0000

7 Upper lip 0.6663 3.049 2.0000 8 Lower lip 0.4748 6.584 2.0000 9 Nose (right) 0.7256 2.721 2.0000 10 Nose (left) 0.6808 0.186 2.0000 11 Nose (center) 0.7734 0.711 2.0000 12 Nose (tip) 1.0000 0.9999 1.0000

13 Ear lobe (right) 0.0000 0.0000 1.0000

14 Ear lobe (left) 0.0000 0.0000 1.0000

15 Chin 0.0000 0.0000 1.0000

Overall dissimilarity 4.63E+05

Table 4

Veriﬁcation of Afusing Auand Ad

Overall dissimilarity

Ds_Au_Af_Ad 1.2141

Ds_Af_Au_Ad 1.9270

Ds_Ad_Au_Af 1.6926

(10)

The composite dissimilarity 1.5821 inTable 4 indi-cates that the facial image Afcan be veriﬁed to be

of person A. On the other hand, it is obvious that Bfis not a facial image of person A since the

com-posite dissimilarity inTable 5is too high.

By using the composite dissimilarity measure, a more robust identity veriﬁcation system is

devel-oped and more experimental results are obtained.

Table 6 shows the composite dissimilarity for the veriﬁcations of facial images Au through Fu

based on relative aﬃne structure established using front and downward looking facial images. Similarly,Table 7veriﬁes facial images Afthrough

Ff and Table 8 veriﬁes facial images Ad through

Fd, respectively. It can be seen from these results

that the threshold for similarity can be set comfortably at 2.5 for the composite dissimilar-ity that every person in our database can be correctly verified with the proposed approach. We can see easily that the developed identity verification system successfully performs the verifi-cation of our experimental database of facial images.

Table 5

Veriﬁcation of Bfusing Auand Ad

Overall dissimilarity Ds_Au_Bf_Ad 499057.33 Ds_Bf_Au_Ad 631.10 Ds_Ad_Au_Bf 1955.54 Composite dissimilarity 8508.22 Table 6

Composite dissimilarities for the veriﬁcation of facial images Authrough Fu

Af, Ad Bf, Bd Cf, Cd Df, Dd Ef, Ed Ff, Fd

Au 1.58 22925.14 1509.41 16456.43 110.70 1995.84

Bu 95.19 1.87 439.52 1.48E+05 1.50E+06 1.23E+05

Cu 12.17 211.89 1.94 1.22E+05 351.99 86389.45

Du 4.08E+05 122.75 3055.30 1.61 50602.31 10.73

Eu 7.85E+06 2861.12 38943.89 62857.37 1.70 810.17

Fu 2.89E+05 2348.07 206.42 32115.82 1.29E+07 1.89

Table 7

Composite dissimilarities for veriﬁcation of facial images Afthrough Ff

Au, Ad Bu, Bd Cu, Cd Du, Dd Eu, Ed Fu, Fd Af 1.58 13738.93 9625.68 26941.43 8.06E+06 1.02E+06 Bf 8508.22 1.87 372.59 18755.60 1.15E+06 8985.88 Cf 11998.25 14725.11 1.94 2791.98 2013.52 5650.15 Df 3042.51 310.53 51170.67 1.61 425738 195.46 Ef 6979.54 1.90E+06 171595 3329.84 1.70 105005 Ff 151.60 2.03E+06 1146.79 1262.61 134511 1.89 Table 8

Composite dissimilarity for veriﬁcation of facial images Adthrough Fd

Au, Af Bu, Bf Cu, Cf Du, Df Eu, Ef Fu, Ff

Ad 1.58 149.81 11.38 2654.73 1336.72 162.89

Bd 705.13 1.87 422.78 3.18 4536.51 16885.11

Cd 10160.21 24.72 1.94 34498.32 6919.54 762.24

Dd 8318.70 1.09E+07 5.61E+05 1.61 1.75E+05 20261.21

Ed 14.15 3155.22 24.01 781.89 1.70 423.75

(11)

As for the sensitivity of the proposed algorithm, the relative aﬃne structure is actually cross ratio in a form which is quite stable numerically.1 This can be seen from Fig. 3 that the error of feature detection, in terms of variance of image pixels on the image plane, will results in minor change in the depth of the spatial structure, e.g., z and z0,

associated with a face. From above simulation results, it seems that differences among face structures of different individuals are much more significant than the differences due to the error of feature detection of facial images of the same per-son, which gives the robustness of the proposed approach.

6. Conclusion

This paper presents a study on computer vision technique and its application in face recognition to achieve identity verification. The explicit relation-ship between the relative affine structure and the cross ratio—an invariant under perspective projec-tion, is addressed. Subsequently, relative affine structures derived from multiple images are used for face recognition. The proposed method neither requires camera calibration nor reconstructs 3D models. Moreover, as long as feature points of fa-cial images are located accurately, the orientation and depth of the face are allowed to very more freely. As shown in our preliminary experiments, the proposed approach does achieve satisfactory results given the feature points of facial images. Slightly large scale of face database can be estab-lished for further investigation of the performance.

References

Atick, J.J., Griﬃn, P.A., Redlich, N.A., 1995. Face recognition from live video. Advance Imaging 10 (5), 58–62.

Belhhumeur, P., Hespanha, J., Kriegman, D., 1997. Eigenfaces vs. ﬁsherfaces: Recognition using class speciﬁc linear pro-jection. IEEE Trans. Pattern Anal. Mach. Intell. 19 (7), 711–720.

Bichsel, M., Pentland, A.P., 1994. Human face recognition and the face image setÕs topology. CVGIP: Image Understand-ing 59 (2).

Chellappa, R., Wilson, C., Sirohey, S., 1995. Human and machine recognition of faces: A survey. Proc. IEEE 83 (5). Colombe, J.B., 2003. A survey of recent developments in theoretical neuroscience and machine vision. In: Proc. 32nd Applied Imagery Pattern Recognition Workshop, pp. 205– 213.

Eriksson, A., Weber, D., 1999. Towards 3-dimensional face recognition, vol. 1. In: Proc. IEEE Conf. on Africon, pp. 401–406.

Hotta, K., 2003. View-invariant face detection method based on local pca cells. In: Proc. 12th Internat. Conf. on Image Analysis and Processing, pp. 57–62.

Jeng, S.H., Liao, H.Y.M., Han, C.C., Chern, M.Y., Liu, Y.T., 1998. Facial feature detection using geometrical face model: An eﬃcient approach. Pattern Recognition 31 (3), 273– 282.

Kanade, T., 1974. Picture processing system by computer complex and recognition of human faces. Ph.D. disserta-tion, Robotics Institute, Carnegie Mellon University. Kriegman, D.J., Yang, M.H., Ahuja, N., 2002. Detecting faces

in images: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 24 (1), 34–58.

Lades, M., Vorbruggen, J.C., Buhmann, J., Lage, J., von der Malsburg, C., Wurtz, R.P., Konen, W., 1994. Distortion invariant object recognition in the dynamic link architec-ture. IEEE Trans. Comput. 42.

Lengagne, R., Tarel, J.P., Monga, O., 1996. From 2D images to 3D face geometry. In: Proc. IEEE Conf. on Automatic Face and Gesture Recognition, pp. 301–306.

Liao, H.Y.M., Han, C.C., Yu, G.J., Tyan, H.R., Chen, M.C., Chen, L.H., 1988. Face recognition using a face-only database: A new approach. In: Proc. Asian Conf. on Computer Vision, p. 1352.

Lin, S.H., Kung, S.Y., Lin, L.J., 1997. Face recognition/ detection by probabilistic decision-based neural network. IEEE Trans. Neural Networks 8 (1).

Liu, J.S., Chuang, J.H., 2002. A geometry-based error estima-tion of cross-ratios. Pattern Recogniestima-tion 35 (12), 155– 167.

Mirhosseini, A.R., Yan, H., 1998. Human face image recogni-tion: An evidence aggregation approach. Computer Vision and Image Understanding 71 (2), 213–230.

Pentland, A., Turk, M., 1991. Face recognition using eigen-faces. In: Proc. CVPR.

Phillips, P.J., 1998. Matching pursuit ﬁlters applied to face recognition. IEEE Trans. Image Process. 7 (8), 1150– 1164.

Rolls, E.T., Stringer, S.M., 2001. Invariant object recognition in the visual system with error correction and temporal diﬀerence learning. Network: Comput. Neural Syst. 12, 111–129.

Rowley, H.A., Baluja, S., Kanade, T., 1998. Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20 (1).

1 _{Please see} _{Liu and Chuang (2002)} _{for a comprehensive}

(12)

Samal, A., Iyengar, P., 1992. Automatic recognition and analysis of human faces and facial expressions: A survey. Pattern Recognition 25, 65–77.

Shashua, A., Navab, N., 1996. Relative aﬃne structure: Canon-ical model for 3D from 2D geometry and applications. IEEE Trans. Pattern Anal. Mach. Intell. 18 (9), 873–883. Turk, M., Pentland, A., 1991. Eigenfaces for recognition. J.

Cognitive Neurosci. 3 (1).

Ya, Y., Zhang, J., 1998. Rotation-invariant 3D reconstruction for face recognition. In: Proc. IEEE Conf. on Image Processing, vol. 1, pp. 156–160.

Yang, G., Huang, T.S., 1994. Human face detection in a complex background. Pattern Recognition 27 (1), 53–63. Zhang, J., Yan, Y., Lades, M., 1997. Face recognition:

Eigenface, elastic matching and neural nets. Proc. IEEE 85 (9).