Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof

(1)

夽_{This work was supported by the National Science Council} under grant no. NSC87-2213-E-001-025.

* Corresponding author. Tel. #886-2-27883799 ext. 1811; fax: #886-2-27824814.

E-mail address: [email protected] (H.-Y.M. Liao).

Why recognition in a statistics-based face recognition

system should be based on the pure face portion:

a probabilistic decision-based proof

夽

Li-Fen Chen , Hong-Yuan Mark Liao

*, Ja-Chen Lin , Chin-Chuan Han

Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan Institute of Information Science, Academia Sinica, 128 Sinica Road, Sec 2, Nankang, Taipei 11529, Taiwan

Received 24 September 1999; received in revised form 24 April 2000; accepted 24 April 2000

Abstract

It is evident that the process of face recognition, by de"nition, should be based on the content of a face. The problem is: what is a_{`facea? Recently, a state-of-the-art statistics-based face recognition system, the PCA plus LDA approach, has} been proposed (Swets and Weng, IEEE Trans. Pattern. Anal. Mach. Intell. 18 (8) (1996) 831}836). However, the authors used_{`facea images that included hair, shoulders, face and background. Our intuition tells us that only a recognition} process based on a_{`purea face portion can be called face recognition. The mixture of irrelevant data may result in an} incorrect set of decision boundaries. In this paper, we propose a statistics-based technique to quantitatively prove our assertion. For the purpose of evaluating how the di!erent portions of a face image will in#uence the recognition results, a hypothesis testing model is proposed. We then implement the above mentioned face recognition system and use the proposed hypothesis testing model to evaluate the system. Experimental results show that the in#uence of the_`reala-face portion is much less than that of the nonface portion. This outcome con"rms quantitatively that recognition in a statistics-based face recognition system should be based solely on the_{`purea face portion. 2001 Pattern Recognition} Society. Published by Elsevier Science Ltd. All rights reserved.

Keywords: Statistics-based face recognition; Face-only database; Hypothesis testing

1. Introduction

Face recognition has been a very popular research topic in recent years [1}5]. It covers a wide variety of application domains, including security systems, per-son identi"cation, image and "lm processing, and human}computer interaction. A complete face recogni-tion system should include two stages. The "rst stage is detecting the location and size of a _{`facea, which is}

di$cult and complicated because of the unknown posi-tion, orientation and scaling of faces in an arbitrary image [6}13]. The second stage of a face recognition system involves recognizing the target faces obtained in the "rst stage. In order to design a good face recognition system, the features chosen for recognition play a crucial role. In the literature [14}17], two main approaches to feature extraction have been extensively used. The "rst one is based on extracting structural facial features that are local structures of face images, for example, the shapes of the eyes, nose, and mouth. The structure-based approaches are not a!ected by irrelevant data, such as hair or background, because they deal with local data instead of global data. On the other hand, the statistics-based approaches extract features from the whole image. Since the global data of an image are used to determine the set of decision boundaries, data which are irrelevant

(2)

to facial portions should be disregarded. Otherwise, these irrelevant portions may contribute to the decision boundary determination process and later mislead the recognition results. From the psychological viewpoint, Hay and Young [18] pointed out that the internal facial features, such as the eyes, nose, and mouth, are very important for human beings to see and to recognize familiar faces. However, it was also pointed out in [19] that, in statistics-based systems, if face images in the database cover the face, hair, shoulder, and background, the_{`faciala portion will not play a key role during the} execution of &face' recognition. In Ref. [20], Bruce et al. compared two of the most successful systems, one pro-posed by Pentland et al. based on PCA [14,21] and the other proposed by von der Malsburg [22,23] based on graph matching. They indicated that the PCA system gave higher correlations to the rating obtained with hair than did the graph matching system.

In recent years, many researchers have noticed this problem and tried to exclude those irrelevant_`nonfacea portions while performing face recognition. In Ref. [1], Belhumeur et al. eliminated the nonface portion of face images with dark backgrounds. Similarly, Goudail et al. [24] constructed face databases under constrained con-ditions, such as asking people to wear dark jackets and to sit in front of a dark background. In Ref. [14], Turk and Pentland multiplied the input face image by a two-dimensional Gaussian window centered on the face to diminish the e!ect caused by the nonface portion. For the same purpose, Sung et al. Ref. [8] tried to eliminate the near-boundary pixels of a normalized face image by using a "xed-size mask. Moghaddam and Pentland [11] and Lin et al. [16] both used probabilistics-based face detectors to extract facial features or cut out the middle portion of a face image for correct recognition. In Ref. [19], Liao et al. proposed a face-only database as the basis for face recognition. All the above-mentioned works tried to use the most_{`correcta information for the} face recognition task. Besides, the works in Refs. [1,19] also conducted some related experiments to show that if the database contains _{`full facea images, changing} the background or hair style may decrease recognition rate signi"cantly. However, they only tried to explain the phenomena observed from their experiments, but a quantitative measure was not introduced to support their assertion. In a statistics-based face recognition sys-tem, global information (pixel level) is used to determine the set of decision boundaries and to perform recogni-tion. Therefore, a mixture of irrelevant data may result in an incorrect set of decision boundaries. The question is: can we measure, quantitatively, the in#uence of the irrel-evant data on the face recognition result? In this paper, we shall use a statistics-based technique to perform this task.

In order to conduct the experiments, two di!erent face databases were adopted. One was a training database

built under constrained environments. The other was a synthesized face database which contained a set of syn-thesized face images. Every synsyn-thesized face image con-sisted of two parts: one was the middle face portion that includes the eyes, nose, and mouth of a face image. The other portion was the complement of the middle face, called the_{`nonfacea portion, of another face image. We} will show in details how to construct these two face databases in the following sections. Based on these two databases, the distances between the distribution of the original training images and that of the synthesized images could be calculated. For the purpose of evaluat-ing how the di!erent portions of a face image in#uence the recognition result, a hypothesis testing model was employed. We then implemented a state-of-the-art face recognition system and used the proposed hypothesis testing model to evaluate the system. Experimental re-sults obtained from the system show that the in#uence of the middle face portion on the recognition process is much less than that of the nonface portion. This outcome is important because it proves, quantitatively or statist-ically, that recognition in statistics-based face recognition systems should be based on pure-face databases.

The organization of this paper is as follows. In Section 2, a state-of-the-art face recognition system which will be examined in this paper is introduced. Descriptions of the proposed hypothesis testing model and experimental re-sults are given in Sections 3 and 4, respectively. Con-clusions are drawn in Section 5.

2. State-of-the-art: PCA plus LDA face recognition In this section, a state-of-the-art face recognition sys-tem, which was implemented and used in the experi-ments, will be introduced. Swets and Weng [25] "rst proposed principal component analysis (PCA) plus linear discriminant analysis (LDA) for face recognition. They applied the PCA technique to reduce the dimensionality of the original image. In their work, the top 15 principal axes were selected and used to derive a 15-dimensional feature vector for every sample. These transformed sam-ples were then used as bases to execute LDA. In other words, their approach can be decomposed into two pro-cesses, the PCA process followed by the LDA process. All the details can be found in Ref. [25]. They reported a peak recognition rate of more than 90%. Recently, Belhumeur et al. [1] and Zhao et al. [2] have proposed systems which use similar methodology and the former one is named_{`Fisherfacesa. The methodology adopted in} the above-mentioned approaches is e$cient and correct. However, for a statistics-based face recognition system like that in Ref. [25], we would like to point out that the database used in their system is incorrect. According to Ref. [25], the face images used in their system contained face, hair, shoulders, and background, not solely face. We

(3)

wonder whether inclusion of irrelevant_{`faciala portions,} such as hair, shoulders, and background, will generate incorrect decision boundaries for recognition. Therefore, in this paper, we shall answer this question based on results obtained using statistical methods. Since the method proposed in Ref. [28] combined the PCA and LDA techniques to decide on the projection axes for the recognition purpose, we shall brie#y introduce the PCA and LDA approaches, respectively, in the following para-graphs.

Principal component analysis (PCA) "nds a set of the most expressive projection vectors such that the projec-ted samples retain the most information about the orig-inal samples. The most expressive vectors derived from a PCA process are the eigenvectors corresponding to the leading largest eigenvalues of the total scatter matrix,

SR",G(sG!m)(sG!m), [26], in the form

tG"=(sG!m), (1)

where sG is the ith original sample, m"(1/N),GsG is the total mean vector, and tG is the projected sample of sG through =, which is the set of projection column vectors. The corresponding computational algorithm of a PCA process can be found in Ref. [25].

In the normal LDA process, one determines the map-ping

vIK"AuIK, (2)

where uIK denotes the feature vector extracted from the

mth face image of the kth class and vIK denotes the

projective feature vector of uIK under transformation of the mapping matrix A. This mapping simultaneously maximizes the between-class scatter while minimizing the within-class scatter of all vIK's (where k"1,2,K;

m"1,2, M) in the projective feature vector space. Here,

in the PCA plus LDA approach, uIK, the input of LDA, is the projective sample obtained from Eq. (1), the output of PCA. Let v I"+

KvIK and v")IvI. The within-class

scatter matrix in the projective feature space can be calculated as follows [27]: SU") I + K (vIK!vI)(vIK!vI). (3) The between-class scatter matrix in the same space can be calculated as follows:

S@") I

(v I!v) (vI!v). (4) The way to "nd the required mapping A is to maximize the following quantity:

tr(S\

U S@). (5)

An algorithm which solves the mapping matrix A can be found in Ref. [28]. However, the major drawback of applying LDA is that the within-class scatter matrix

SU in Eq. (5) may be singular when the number of

sam-ples is smaller than the dimensionality of samsam-ples. Some researchers have proposed di!erent approaches to solve this problem [1,28,29]. In the PCA plus LDA approach [1], they "rst project samples into a reduced dimensional space through the PCA process such that SU in the following LDA process is guaranteed to be nonsingular. A Euclidean distance classi"er can be used to perform classi"cation in the mapped space.

3. Hypothesis testing model

We have mentioned in the previous section that inclu-sion of irrelevant_{`faciala portions, such as hair,} shoul-ders, and background, will mislead the face recognition process. In this section, we shall propose a statistics-based hypothesis testing model to prove our assertion. Before going further, we shall de"ne some basic notations which will be used later.

Let XI"xIK,m"1,2,MxIK is the feature vector extracted from the mth face image of the kth person denote the set of feature vectors of the M face images of classI (person k), where xIK is a d-dimensional col-umn vector and each class collects M di!erent face im-ages of a person. For simplicity, the M face imim-ages of every person are labelled and arranged in order. Each class is then represented by a likelihood function. With-out loss of generality, assume that the class likelihood function, p(xI), of class I is a normal distribution [30]:

p(xI)" 1

(2)Bexp

! 1

2(x!)2\(x!)

, (6) where x is a d-dimensional column vector, and and  are the mean vector and covariance matrix of p(xI), respectively. Some researchers [11,16,31] have used this model to describe the face images of the same person (class) and adopted di!erent criteria to estimate the para-meters, and . Here, for simplicity, we use the sample mean, x I"(1/M)+

KxIK, and the sample covariance

matrix, I"(1/M)+K(xIK!xI)(xIK!xI), to repres-ent the estimates of and , respectively.

For each vector set XI of class I(k"1,2,K), an additional vector set, YJI (l"1,2,K,lOk), is extracted and associated with it. The number of elements in YJI (for a speci"c l) is equal to M, which is exactly the same as the number of elements in XI. The formation of the elements in YJI is as follows. First, we manually point three land-marks on each face image to locate the positions of the eyes and the mouth. According to these landmarks, each

(4)

Fig. 1. Examples of detecting middle face portions from face images. (a) and (c) show two face images, each of which contains three landmarks. (b) and (d) show the corresponding middle face portions of (a) and (b), respectively. (e) and (f) are the synthetic face images obtained by exchanging the middle face portions of (a) and (c), respectively.

face image can be adequately cropped to form an image block of the corresponding middle face. Two examples showing how to construct synthetic face images are shown in Fig. 1. Fig. 1(a) and (c) show two face images with landmarks. The landmarks on these images are manually located. According to the proportion of the distances between these landmarks, the corresponding middle face portion can be formed as shown in Fig. 1(b) and (d), respectively. From Fig. 1(b) and (d), it is easy to see that faces of di!erent sizes will be adequately cropped. For constructing a synthetic face image, a normalization process is applied to deal with the issues such as scale, brightness, and boundary. Fig. 1(e) shows the synthetic

face image which is synthesized from the nonface portion of (a) and the middle face image of (c). Similarly, Fig. 1(f) shows the synthetic face image which is synthesized from the nonface portion of (c) and the middle face image of (a). Hence, each element in YJI is a d-dimensional feature vector extracted from a synthesized face image which combines the middle face portion of an element inJ and the nonface portion of its corresponding element inI. We have mentioned that the M elements in XI (extracted fromI, k"1,2,K) are arranged in order (from 1 to

M). Therefore, the synthesized face image sets as well as

the feature sets extracted from them are all arranged in order. The reason why we ordered these images is be-cause here we want to make the synthesized images as real as possible. And it can be done when the images are obtained under constrained environments, such as con-trolled lighting condition, "xed view orientations, and neutral expression. In sum, for each vector set XI of class I (k"1,2,K), there are (K!1) synthesized feature sets associated with it. In what follows, we shall provide some formal de"nitions of the synthesized sets.

Let wNO denote the pth face image of class O (p"1,2, M). For l"1,2, K, lOk, we have the (K!1) feature sets which are associated with XI, de"ned as follows:

YJI"yJI(m),m"1,2,MyJI(m) is a d-dimensional fea-ture vector extracted from a synthesized face image which combines the middle face portion of wKJ and the nonface portion of wKI. (7) Fig. 2 is a graphical illustration showing how YJI is extracted. One thing to be noted is that when we combine two di!erent portions of images, some rescal-ing and normalization preprocessrescal-ings are necessary in order to reduce boundary variations. Fig. 3 is a typical example illustrating how the synthesized face image is combined with the middle face portion of an image in J and the nonface portion of its corresponding image inI.

Bichsel and Pentland [15] have shown, from the topological viewpoint, that when a face undergoes changes in its eye width, nose length, and hair style, it is still recognized as a human face. Therefore, it is reasonable to also represent the above-mentioned feature vector set, YJI, as a normal distribution function. Now, since all the feature vector sets are represented by normal distributions, their distances can only be evaluated by using some specially de"ned metrics. In the literature [32}35], the Bhattacharyya distance is a well-known metric which is de"ned for measurement of the similarity (correlation) between two arbitrary statistical distribu-tions. A lower distance between two distributions means higher correlation between them. For two arbitrary dis-tributions p(x) and p(x) of classes and , respectively, the general form of the Bhattacharyya

(5)

Fig. 2. Each rectangle in the left column represents one face image, and the circle area is the middle face portion. The middle entry in the left column shows that each synthesized face image corresponding to vector yJI(m) is obtained by combining the middle face portion of wKJ in class J with the nonface portion of its counterpart wKI in class I.

Fig. 3. Examples of synthesized face images. (a) The mth face image inI } wKI; (b) the mth face image in J } wKJ; (c) the synthesized face image obtained by combining the middle face portion of wKJ and the nonface portion of wKI. The extracted feature vector corresponding to this synthesized face image is yJI(m); (d) some other examples with 5 di!erent l's (persons).

distance is de"ned as

D(,)"!ln

(p(x)p(x))dx. (8) When both and are normal distributions, the Bhattacharyya distance can be simpli"ed into a new form

as follows: D(, )"1 8(!)2

#2

\ (!) # 1 2ln(#)/2() , (9)

(6)

Fig. 4. In the top rows of (a) and (b), each rectangle region together with the circle region inside it represent a face image. The mark k or l denotes the class to which that region belongs. The feature vectors in the middle rows of (a) and (b) are extracted from the corresponding face images (pure or synthesized). The assemblages of all vectors (e.g., xIK) form normal distributions of corresponding vector sets (e.g., XI). The bottom rows of (a) and (b) represent the di!erence between the two distributions, which can be computed using the Bhattacharyya distance.

where , and , are the mean vectors and covariance matrices of and , respectively [30]. In what follows, we shall de"ne a hypothesis testing model for use as a tool in experiments. The Bhattacharyya distance will be used as a decision criterion for determin-ing acceptance or rejection of our hypothesis.

3.1. The hypothesis testing model

In the hypothesis testing, our goal was to prove that the in#uence of the nonface portions of face images on the recognition result is larger than that of the middle face portions of face images; that is, the nonface portion of a face image dominates the recognition result. In what follows, we shall de"ne a metric based on the above-mentioned Bhattacharyya distance. The metric to be de"ned for a speci"c class k is a real-number set, DI. The de"nition of DI is as follows:

DI"dI(l), l"1,2, K; lOkdI(l)

"D(XI,YJI)!D(XJ,YJI), (10) where D(䢇) represents the Bhattacharyya distance be-tween two distributions as de"ned in Eq. (9).

For a speci"c class k, there are in total K!1 elements contained in DI. The physical meaning of every constitu-ent of DI, i.e., dI(l) (l"1,2, K; lOk), is a statistical measure that can be used to evaluate the importance, quantitatively, between the middle face portion and the nonface portion. Fig. 4 illustrates how dI(l) is calculated in a graphically illustrative manner. Fig. 4(a) shows how the "rst term that de"nes dI(l) is calculated. The top row of Fig. 4(a) contains two rectangles, each of which in-cludes a circle region. The rectangle region together with the circle region inside represents a face image. The left-hand side combination contains 2 k's. This means that the middle face portion (the circle region) and the

nonface portion (the rectangle region excluding the circle region) belong to the same person. The right-hand side combination, on the other hand, contains the nonface portion belonging to person k and the middle face por-tion belonging to person l, respectively. The middle row of Fig. 4(a) shows the corresponding feature vectors ex-tracted from the (pure) face image on the left-hand side and the synthesized face image on the right-hand side, respectively. Both assemblages of xIK and yJI(m) contain, respectively, M elements. The bottom rows of Fig. 4(a) and (b) represent, respectively, the di!erence between the two distributions, which can be computed using the Bhattacharyya distance as de"ned in Eq. (9). In what follows, we shall report how the degree of importance between the middle face portion and the nonface portion can be determined based on the value of dI(l).

From Eq. (10), it is obvious that when dI(l)*0, the distribution of YJI is closer to that of XJ than to that of XI. Otherwise, the distribution of YJI is closer to that of XI than to that of XJ. According to the de"nition of face recognition, the recognition process should be domin-ated by the middle face portion. In other words, the normal situation should result in a dI(l) which has a value not less than zero. If, unfortunately, the result turns out to be dI(l)(0, then this means that the nonface portion dominates the face recognition process. We have men-tioned that for a speci"c class k, there are in total K!1 possible synthesized face image sets. Therefore, we shall have K!1 dI(l) values (for l"1,2, K, lOk). From the statistical viewpoint, if more than half of these dI(l) values are less than zero, then this means that the face recogni-tion process regarding person k is dominated by the nonface portion. The formal de"nition of the test values for person k is as follows:

HM I : p(dI(l)*0; dI(l)3DI)*0.5,

(7)

Fig. 5. PCA plus LDA based face recognition using a synthesized face image as the query image: (a) and (b) are the original face images of two persons. The leftmost image of (c) is the query image synthesized from (a) and (b), and the other images are the top 5 closest retrieved images (ordered from left to right).

where HM I represents the null hypothesis, HI stands for the alternative hypothesis, and p(䢇) here represents the prob-ability decided under a prede"ned criterion䢇. According to the de"nition of DI, it contains K!1 dI(l) real values. Therefore, the rules de"ned in Eq. (11) will let the null hypothesis HM I be accepted whenever the amount of dI(l) which has a value not less than zero is more than one half of K!1; otherwise, the alternative hypothesis HI will be accepted.

The rules described in Eq. (11) are only for a speci"c class k. If they are extended to the whole population, a global hypothesis test rule is required. The extension is trivial and can be written as follows:

HM : p(HMI is accepted, k"1,2, K)*0.5,

H : p(HM I is accepted, k"1,2, K)(0.5. (12) The physical meaning of the rules described in Eq. (12) is that when over half of the population passes the null hypothesis, the global null hypothesis HM is accepted; otherwise, the global alternative hypothesis will be accep-ted. When the latter occurs, this means that the nonface portion of a face image dominates the face recognition process among the majority of the whole population.

4. Experimental results

Before showing our experimental results, we will "rst introduce a series of experiments conducted in Ref. [17]. Liao et al. conducted a series of experiments using the synthesized images >JI. From the results, they found that in the PCA plus LDA approach, the nonface portion dominated the whole recognition process. It is obvious that the results shown in Fig. 5 indicate that the face

portion did not play a key role in the recognition process. The similar experiment has been conducted in Ref. [1]. Belhumeur et al., partitioned the images into two di!er-ent scales: one included the full face and part of the background while the other one was cropped and in-cluded only internal facial structures such as brow, eyes, nose, mouth, and chin. They found that the recognition rate using a full-face database is much better than that using a closely cropped database; however, if the back-ground or hair style of full-face images have been varied, the recognition rate would have been much lower and even worse than that using closely cropped images. These exciting results encourage us to make a formal (quantit-ative) proof of the problem.

In the experiments described below, the statistics-based state-of-the-art face recognition system proposed by Swets and Weng [25] was implemented and tested against the proposed hypothesis testing model. The training database contained 90persons (classes), and each class contained 30di!erent face images of the same person. The 30face images of each class were labelled and ordered according to the orientations in which they were obtained. These orientations included ten frontal views, ten frontal views with 153 to the right, and ten frontal views with 153 to the left. The process for collecting facial images was as follows: after asking the persons to sit down in front of a CCD camera, with neutral expression and slightly head moving in three di!erent orientations, a 30-s period for each orientation was recorded on video-tape under well-controlled lighting condition. Later, a frame grabber was used to grab 10image frames for each orientation from the videotape and stored them with resolution of 155;175 pixels. Since these images were obtained under the same conditions, the synthesized im-ages used in our hypothesis testing would look very

(8)

Fig. 6. The distributions of 2-dimensional vectors which were extracted using the PCA plus LDA approach. Each node repres-ents the feature vector extracted from a face image, and there were 30nodes for each person. &o' and &x' represent XI and XJ of persons k and l, respectively. &#' stands for YJI, which represents the synthesized image obtained by combining the middle face portion of person l with the nonface portion of person k. The horizontal axis and vertical axis are, respectively, the most discriminating and the second most discriminating projection axes in the projective feature space. This "gure shows that &#' (YJI) was very close to class &o' (XI).

Fig. 7. The experimental results for DI obtained using the PCA plus LDA approach. &o' is the distance between XI and YJI, and &#' is the distance betweenXJ and YJI. (a) shows the values of the "rst term (&o') and the second term (&#') of every dI(l) in DI, l"2,2, 90, where k"1; (b) shows the individual probabil-ities of p(dI(l)*0; dI(l)3DI), k"1,2, 90. These "gures show that Y_{JI will be classi"ed into class k, which includes the nonface} portion of Y_JI.

similar to real images visually. For the PCA plus LDA approach proposed by Swets and Weng [25], each pro-jective feature vector extracted from a face image is 15 dimensional. Based on these feature vectors of training samples, the proposed hypothesis model was tested. Since the projection axes derived through linear criminant analysis were ordered according to their dis-criminating capability, the "rst projection axis was most discriminating followed by the second projection axis. For the convenience of visualization, all the samples were projected onto the "rst two projection axes and are shown in Fig. 6 for the proposed hypothesis model.

Fig. 6 shows the three related distributions covered in

DI.&o' and &x' representXI of person k and XJ of person l, respectively, and &#' represents YJI, whose element com-bines the middle face portion of person l and the nonface portion of person k. The distributions of XI, XJ, and YJI all covered 30elements (two-dimensional vectors). Each dis-tribution was enclosed by an ellipse, which was drawn based on the distribution's scaled variance on each di-mension. Therefore, most of the feature vectors belonging to the same class were enclosed in the same ellipse. In Fig. 6, it is obvious that the distribution of YJI is closer to that of XI. This means that the nonface portions of the set of face images dominated the distribution of the projec-tive feature vector set. That is, the distribution of YJI was completely disjointed from that of XJ and almost

completely overlapped that of XI. From the view of classi"cation, each element in YJI would be classi"ed into class k, the one which contains the nonface portion of the test image. In sum, the results of experiments shown in Fig. 6 con"rm that the nonface portion of a face image did dominate the distributions of the two-dimensional projective feature vectors.

Fig. 7 shows the experimental results obtained by applying the proposed hypothesis testing model. In this case, k was set to 1. That is, l ranged from 2 to 90 (horizontal axis). The &o' sign shown in Fig. 7(a) repres-ents the Bhattacharyya distance (vertical axis) between

(9)

XI and YJI, which is the "rst term of dI(l). The &#' sign shown in Fig. 7(a), on the other hand, represents the Bhattacharyya distance (vertical axis, too) between XJ and YJI and is the second term of dI(l). The results shown in Fig. 7(a) re#ect that from l"2 to 90, the second term (&#') of dI(l) was always larger than its"rst term (&o'). Therefore, we can say that for k"1 (class 1), the probability that the "rst term of dI(l) (l"2,2, 90) was larger than the second term of dI(l) is zero. This means that the distance between XI and YJI was always smaller than the distance between XJ and YJI for k"1,

l"2,2, 90. One thing worth noticing is that the PCA

plus LDA approach had the ability to extract very `dis-criminating_{a projection axes since the distributions,} XJ and XI, of di!erent persons were far away. Therefore, the phenomenon whereby the nonface portion domin-ated the face recognition process was very apparent in the results obtained by using the PCA plus LDA ap-proach. This conclusion is con"rmed by the individual probability values shown in Fig. 7(b). Fig. 7(b) shows, from classes 1 to 90, the individual probability that the "rst term of dI(l) (l"2,2, 90) was larger than the second term of dI(l). From this"gure, it is obvious that most of the individual probabilities (ranging from 1 to 90) were zero. Only a few individual probabilities had values very close to zero (less than 0.05). From the data shown in Fig. 7(b), we can draw a conclusion that all the individual null hypotheses H_{M I's (k"1,2, 90) were rejected, and} that the probability of accepting HM I (k"1,2, 90) was equal to zero. Moreover, since p(H_{M I is accepted,}

k"1,2, K)"0, the global alternative hypothesis H is

accepted. That means for the whole population in this database, the nonface portions of a face image, including hair, shoulder and background, dominate the face recog-nition process. A possible reason for the above phenom-enon is that the number of pixels of the background could be more than the number of pixels of the face portion in the synthesized images. (Therefore, with a PCA-based algorithm, it is not unfair to expect that the synthesized face could match the background better than the face.)

5. Conclusion

In this paper, we have proposed a statistics-based technique to quantitatively prove that the previously proposed face recognition system used _`incorrecta databases. According to the de"nition of face recogni-tion, the recognition process should be dominated by the `purea face portion. However, after implementing a state-of-the-art statistics-based face recognition system based on PCA plus LDA, we have shown, quantitatively, that the in#uence of the middle face portion on the recognition process in their system was much smaller than that of the nonface portion. That is, the nonface

portion of a face image dominated the recognition result. This outcome is very important because it proves, quant-itatively or statistically, that some of the previous statis-tics-based face recognition systems have used_`incorrecta face databases. This outcome also reminds us that if we adopt databases established by other people, a prepro-cessing stage has to be introduced. The purpose of the preprocessing stage is to guarantee the correct use of a face database.

References

[1] P.N. Belhumeur, J.P. Hespanha, D.J. Kiregman, Eigenfa-ces vs. "sherfaEigenfa-ces: Recognition using class speci"c linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711}720.

[2] W. Zhao, R. Chellappa, A. Kirshnaswamy, Discriminant analysis of principal components for face recognition, Pro-ceedings of the Third Conference on Automatic Face and Gesture Recognition, Japan, April, 1998, pp. 336}340. [3] R. Chellappa, C. Wilson, S. Sirohey, Human and machine

recognition of faces: a survey, Proc. IEEE 83 (5) (1995) 705}740.

[4] D. Valentin, H. Abdi, A. O'Toole, G. Cottrell, Connection-ist models of face processing: a survey, Pattern Recogni-tion 27 (9) (1994) 1209}1230.

[5] A. Samal, P. Iyengar, Automatic recognition and analysis of human faces and facial expressions: a survey, Pattern Recognition 25 (1) (1992) 65}77.

[6] S.A. Sirohey, Human face segmentation and identi"cation, Master's Thesis, University of Maryland, 1993.

[7] G. Yang, T.S. Huang, Human face detection in a complex background, Pattern Recognition 27 (1) (1994) 53}63. [8] K.K. Sung, T. Poggio, Example-based learning for

view-based human face detection, A.I. Memo 1521, MIT Press, Cambridge, 1994.

[9] B. Moghaddam, A. Pentland, Probabilistic visual learning for object detection, Proceedings of the Fifth IEEE Con-ference on Computer Vision, 1995, pp. 786}793. [10] P. Juell, R. Marsh, A hierarchical neural network for

human face detection, Pattern Recognition 29 (5) (1996) 781}787.

[11] B. Moghaddam, A. Pentland, Probabilistic visual learning for object representation, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 696}710.

[12] S.H. Jeng, H.Y. Mark Liao, C.C. Han, M.Y. Chern, Y.T. Liu, Facial feature detection using geometrical face model: an e$cient approach, Pattern Recognition 31 (3) (1998) 273}282.

[13] C.C. Han, H.Y. Mark Liao, G.J. Yu, L.H. Chen, Fast face detection via morphology-based pre-processing, Pattern Recognition 33 (10) (2000) 1701}1712.

[14] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogni-tive Neurosci. 3 (1) (1991) 71}86.

[15] M. Bichsel, A.P. Pentland, Human face recognition and the face image set's topology, CVGIP: Image Understand-ing 59 (2) (1994) 254}261.

[16] S.H. Lin, S.Y. Kung, L.J. Lin, Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. on Neural Networks 8 (1) (1997) 114}132.

(10)

[17] H.Y. Mark Liao, C.C. Han, G.J. Yu, H.R. Tyan, M.C. Chen, L.H. Chen, Face recognition using a face-only database: a new approach, Proceedings of the Third Asian Conference on Computer Vision, Hong Kong, Lecture Notes in Computer Science, Vol. 1352, January 1998, pp. 742}749.

[18] D. Hay, A.W. Young, in: A.W. Ellis (Ed.), The human face, Normality and Pathology in Cognitive Functions, Aca-demic Press, New York, 1982.

[19] H.Y. Mark Liao, C.C. Han, G.J. Yu, Face#hair#shoul-ders#backgroundOface, Proceedings of Workshop on 3D Computer Vision '97, The Chinese University of Hong Kong, 1997, pp. 91}96. (Invited paper.).

[20] V. Bruce, P.J.B. Hancock, A.M. Burton, Comparisons between human and computer recognition of faces, Pro-ceedings of the Third International Conference on Auto-matic Face and Gesture Recognition April 1998, pp. 408}413.

[21] A. Pentland, B. Moghaddam, T. Starber, View-based and modular eigenspaces for face recognition, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, June 1994, pp. 84}91.

[22] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lage, C, von der Malsburg, R.P. Wurtz, W. Konen, Distortion invariant object recognition in the dynamic link architecture, IEEE Trans. Comput. 42 (1994) 300}311.

[23] L. Wiskott, J.M. Fellous, N. Kruger, C. von der Malsburg, Face recognition by elastic bunch graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 775}779. [24] F. Goudail, E. Lange, T. Iwamoto, K. Kyuma, N. Otsu,

Face recognition system using local autocorrelations and multiscale integration, IEEE Trans. Pattern Anal. Mach. Intell. 18 (10) (1996) 1024}1028.

[25] D. Swets, J. Weng, Using discriminant eigenfeatures for image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 18 (8) (1996) 831}836.

[26] A.K. Jain, R.C. Dubes, Algorithms for clustering data, Prentice-Hall, Englewood Cli!s, NJ, 1988.

[27] R. Schalko!, Pattern Recognition: Statistical, Structural and Neural Approaches, Wiley, New York, 1992. [28] K. Liu, Y. Cheng, J. Yang, Algebraic feature extraction for

image recognition based on an optimal discriminant cri-terion, Pattern Recognition 26 (6) (1993) 903}911. [29] L.F. Chen, H.Y.M. Liao, J.C. Lin, M.D. Kao, G.J. Yu,

A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition 33 (10) (1999) 1713}1726.

[30] K. Fukunaga, Introduction to Statistical Pattern Recogni-tion, Academic Press, New York, 1990.

[31] C. Lee, D.A. Landgrebe, Feature extraction based on deci-sion boundaries, IEEE Trans. Pattern Anal. Mach. Intell. 15 (4) (1993) 388}444.

[32] A. Bhattacharyya, On a measure of divergence between two statistical populations de"ned by their prob-ability distributions, Bull. Calcutta Math. Soc. 35 (1943) 99}110.

[33] X. Tnag, W.K. Stewart, Texture classi"cation using princi-pal component analysis techniques, Proc. SPIE } Int. Soc. Opt. Engng. 2315 (13) (1994) 22}35.

[34] G. Xuan, P. Chai, M. Wu, Bhattacharyya distance feature selection, Proceedings of the 13th International Conference on Pattern Recognition, Vol. 2, 1996, pp. 195}199.

[35] C. Lee, D. Hong, Feature extraction using the Bhat-tacharyya distance, IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernet-ics and Simulation 3 (1997) 2147}2150.

About the Author*LI-FEN CHEN received the B.S. degree in computer science from National Chiao Tung University, Hsing-Chu, Taiwan, in 1993. She is currently a Ph.D. student in the Department of Computer and Information Science at National Chiao Tung University. Her research interests include image processing, pattern recognition, face recognition, Computer Vision, and wavelets. About the Author*HONG-YUAN MARK LIAO received the B.S. degree in physics from National Tsing-Hua University, Hsin-Chu, Taiwan, in 1981, and the M.S. and Ph.D. degrees in electrical engineering from Northwestern University, Illinois, in 1985 and 1990, respectively.

He was a research associate in the Computer Vision and Image Processing Laboratory at Northwestern University during 1990}1991. In July 1991, he joined Institute of Information Science, Academia Sinica, Taiwan, as an assistant research fellow. He was promoted to associate research fellow and then research fellow in 1995 and 1998, respectively. Currently, he is the deputy director of the same institute.

Dr. Liao's current research interests are in multimedia signal processing, wavelet-based image analysis, content-based multimedia retrieval, and multimedia protection. He was the recipient of the Young Investigators' award of Academia Sinica in 1998; the best paper award of the Image Processing and Pattern Recognition society of Taiwan in 1998; and the paper award of the above society in 1996 and 1999. Dr. Liao served as the program chair of the International Symposium on Multimedia Information Processing (ISMIP), 1997. He also served on the program committees of the International Symposium on Arti"cial Neural Networks, 1994}1995; the 1996 International Symposium on Multi-technology Information Processing; the 1998 International Conference on Tools for AI; and the 2000 International Joint Conference on Information Sciences. Dr. Liao is on the Editorial Boards of the IEEE Transactions on Multimedia; the International Journal of Visual Communication and Image Representation; the Acta Automatica Sinica; and the Journal of Information Science and Engineering. He is a member of the IEEE Computer Society and the International Neural Network Society (INNS).

About the Author*JA-CHEN LIN was born in 1955 in Taiwan, Republic of China. He received his B.S. degree in computer science in 1977 and M.S. degree in applied mathematics in 1979, both from National Chiao Tung University, Taiwan. In 1988 he received his Ph.D. degree in mathematics from Purdue University, U.S.A. In 1981}1982, he was an instructor at National Chiao Tung University. From 1984 to 1988, he was a graduate instructor at Purdue University. He joined the Department of Computer and Information Science at

(11)

National Chiao Tung University in August 1988, and is currently a professor there. His recent research interests include pattern recognition and image processing. Dr. Lin is a member of the Phi-Tau-Phi Scholastic Honor Society.

About the Author*CHIN-CHUAN HAN received the B.S. degree in computer engineering from National Chiao Tung University in 1989, and an M.S. and a Ph.D. degree in computer science and electronic engineering from National Central University in 1991 and 1994, respectively. From 1995 to 1998, he was a postdoctoral fellow in the Institute of Information Science, Academia Sinica, Taipei, Taiwan. He is was an assistant research fellow in the Applied Research Lab., Telecommunication Laboratories, Chunghwa Telecom Co. in 1999. He is currently an assistant professor in the Department of Information Management, Nan-Hua University. His research interests include face recognition, biometrics veri"cation, 2-D image analysis, computer vision, and pattern recognition.