Benchmark Face Detection using a Face Recognition Database

(1)

BENCHMARK FACE DETECTION USING A FACE RECOGNITION DATABASE Gee-Sern Hsu, Thu Ha Tran, Sheng-Lun Chung

National Taiwan University of Science and Technology

Department of Mechanical Engineering, *Department of Electrical Engineering, 43 Sec.4 Keelung Rd., Taipei, Taiwan

ABSTRACT

A framework is proposed to generate datasets good for bench- marking face detection using database meant for benchmark- ing face recognition. Instead of the common way of collect- ing images manually, the datasets from the proposed frame- work are made by a synthesis process with two phases: in- trinsic parameterization and extrinsic parameterization. The former parameterizes the intrinsic variables that affect the ap- pearance of a face, while the latter parameterizes the extrinsic variables that dominate how faces appear on background im- ages as required by a test criterion. Experiments reveal that the proposed framework can generate test samples similar to those available from a popular face detection database, and also samples unavailable from existing face databases.

Index Terms— Face detection, facial database, face recognition.

1. INTRODUCTION

Most face detection algorithms are evaluated with images col- lected from various sources, and each image has one or a few faces in it with ground truth annotated manually. The perfor- mance of a face detection algorithm is measured by the dif- ferences between the ground truth and those determined by the algorithm. It is commonly acknowledged that a good face detection algorithm should be able to detect faces of differ- ent sizes, orientations, poses, with occlusions, under various illumination conditions, and with locations anywhere in the image. It will be extremely exhausting, if not impossible, to collect test samples good enough to encompass most of these variables with a large scope of variation in each variable.

After the works by Sung and Poggio, and Rowley et al.

¹

(both works are reviewed in [1]), their collections of sam- ple images were considered as benchmark datasets for perfor- mance evaluation, and known as the CMU/MIT face database.

An extensive survey made by Yang et al. [1] revealed that the CMU/MIT database was the most popular one for per- formance test by 2002. We completed a survey that covered more than 20 face detection methods published from 2002 to 2009, and it showed that the CMU/MIT database was used in

1Both works can be found in IEEE Trans. PAMI, vol.20, no.1, Jan. 1998.

almost 60% of them, followed by 19% using databases made for evaluating face recognition. These surveys indicate that the majority of face detection algorithms use either one or both of the following two types of datasets for benchmarking:

Type-1: Test images are collected from various sources.

Because the collected faces appear to vary across poses, illu- minations, sizes, and other variables, one often tends to accept the validity of such datasets without looking into the scope of variation in the variables. Examples are the CMU/MIT database, Kodak face dataset [2], and many personal collec- tions which have not been released to the public.

Type-2: These are designed for benchmarking face recog- nition instead of face detection, for example, FERET [3], AR [4], PIE [5] databases and others. Most of them offer faces with many variables, and each variable covers a wide scope of variation. For example, the PIE database [5] offers 13 poses, 43 illumination conditions, and 3 expressions per subject.

Although many variables are covered in Type-1 databases, only a limited scope of variation in each variable is covered by the samples. A face detector that yields a high detection rate on one such dataset may not perform well in detecting faces with a speciﬁc test criterion, for example, faces of the same pose but in different sizes and illumination conditions.

Because Type-2 are designed for face recognition, most faces are similar in size, appear at a few ﬁxed locations, and with one face per image, making them inappropriate for benchmarking face detection. However, most Type-2 offer a wider scope of variation in many intrinsic variable than that covered by Type-1.

This paper proposes a solution to the above issues with a 2-phase framework. In the ﬁrst phase it takes in faces from a Type-2 database, known as the mother database, with in- trinsic variables as required by a test criterion. In the second phase it generates daughter datasets with extrinsic variables, also as required by the test criterion. The daughter datasets can be made similar to those used for a generic evaluation, or with some speciﬁc scope of variables for a special test.

The framework is presented in Section 2. To demonstrate its capacity in generating datasets larger than existing bench- mark databases, Section 3 presents experimental validations following two scenarios: one generates a daughter dataset close to the popular CMU/MIT database; the other generates

3821

978-1-4244-7994-8/10/$26.00 ©2010 IEEE ICIP 2010

Proceedings of 2010 IEEE 17th International Conference on Image Processing September 26-29, 2010, Hong Kong

(2)

daughter datasets unavailable from existing databases.

2. THE PROPOSED FRAMEWORK

The variables that affect the performance of a face detection algorithm can be split into two categories, intrinsic and ex- trinsic. The intrinsic variables can alter the appearance of a face, such as pose, expression, illumination, gender, and ac- cessaries as glasses and hats. The extrinsic variables deter- mine how faces appear in an image, for example, the size, number, and spatial distribution of the faces across the image.

The proposed framework is thus designed with two phases in it: the Intrinsic Parameterization (or IP for short), and the Extrinsic Parameterization and Spatial Distribution (EPSD).

The IP begins with cropping faces from a mother database, and winds up with a Parameterized Face Database (PFD), a large collection of parameterized faces. Given a test crite- rion, the EPSD selects the matched samples from the PFD and generate the required daughter datasets.

2.1. Intrinsic Parameterization

The ideal candidates for the mother database must include many intrinsic variables with a wide scope of variation in each variable. Pose and illumination are considered two ma- jor challenging variables in most applications, and if the num- ber of individuals and expressions are also taken into account, the PIE [5] and CAS-PEAL [6] can be the best candidates.

The PIE database is selected as an example in this study.

Parameterized Face Database (PFD)

Mother Database (Face Recognition Benchmark DB)

Cropped Faces

Pose Normalization

Pose

Illumination Clustering Sample Add-on

using Illumination Cone

Add

Subj_01 ~ Subj_Ns Pose_01 ~ Pose_Np Exp_01 ~ Exp_Ne Light_01 ~ light_Nl

S+ or S-

) , ( ) 1 , 1

( ~

p sN

GN

G

Pose Lighting

Parameter Space of Facial Variables Face tagged by

parameters

Intrinsic Parameterization P

Fig. 1. Flowchart of the Intrinsic Parameterization.

The IP’s ﬂowchart is shown in Fig. 1 with three modules:

(1) size normalization and pose parameterization, (2) illumi- nation clustering, and (3) sample add-on using illumination cone [7] and parameterization of other intrinsic variables.

Size normalization and pose parameterization

The faces from the PIE database can be split into two major pose variation groups, one is in chin-up to chin-down

vertical pattern and the other in left-proﬁle to right-proﬁle horizontal pattern. The former can be normalized to D

h

, the horizontal baseline between both eyes; and the latter can be normalized to D

v

, the vertical baseline between the eyes and chin. Those with pose variation in both the vertical and horizontal directions, as the poses tagged with C25 and C31 in PIE, are normalized with tilted D

_h

and D

_v

. After size normalization, the faces are segmented and parameterized according to the pose with the same longitude and latitude.

The faces in vertical variations are parameterized into (φ

_i

,0), where φ

i

is the longitude of the chin-up (or -down) pose mea- sured from the frontal, (0,0), and i = 1 or −1 for chin-up and chin-down, respectively. Those in horizontal variations are parameterized into (0, θ

_j

), where θ

_j

is the latitude of a sided pose, and j = ±1, ±2, ±3, ±4. The poses C25 and C31 are parameterized as (φ

−1

, θ

−3

) and (φ

−1

, θ

3

), respectively.

The parameterized poses are limited by the 13 pairs of (φ

i

, θ

j

) available from the PIE mother database. The illumi- nation cone [7] is applied to generate additional poses so that the PFD can have more poses to offer. An introduction to illumination cone will be given later in this section.

Illumination clustering and parameterization

The sample mother database, PIE, has defined 43 illumi- nation conditions. While it is possible to come up with 43 parameters for them, we reduce this number by clustering those with similar patterns. Most illumination conditions in PIE vary from side to side, instead of top to bottom. A su- pervised clustering scheme is applied to segment the illumi- nation into 10 clusters in each of the aforementioned 13 pose classes, and it is composed of the following steps – Step1: 10 different illumination conditions in each pose class are manu- ally selected as initial templates according to their visual pat- terns with reference to the flash system reported in the PIE’s setup [5]. Step2: the rest of the illumination conditions are clustered using EM upon the low frequency DCT coefficients extracted from the faces. Merged with the pose parameters, (φ

i

, θ

j

), each face from the mother database can now be pa- rameterized by (φ

_i

, θ

_j

, i

_k

), where i

_k

is the k-th illumination condition and k = 1, 2, ..., 10.

Illumination cone and other intrinsic parameters

Several approaches, for example the 3D morphable model and the illumination cone [7], use a few sample faces to gen- erate faces with different illumination conditions and poses.

The illumination cone is selected in this work for relatively inexpensive computation. It exploits the fact that a set of facial images with the same pose, but taken under different lighting conditions, is a convex cone in the image space. Us- ing a few training samples of each face under different illu- mination conditions, the shape and albedo of the face can be reconstructed using the Generalized Bas-Relief (GBR) trans- formation. This reconstruction leads to a generative model able to render or synthesize the images of the face under novel poses and illumination conditions. The requirement of train-

3822

(3)

ing images with the same pose but different illumination can be readily met by the samples from the mother database.

In addition to pose and illumination, expression, orien- tation (in-plane rotation), gender, and accessaries as glasses, hats and masks are also considered as intrinsic parameters.

Gender takes 1 for male, and 0 for female. Limited by the 3 expressions offered by the PIE mother database, the expres- sion parameters in the PFD can only be given 1 for smile, 0 for neutral, and −1 for blinking. The orientation can take

−180

^o

to 180

^o

, if upright pose is considered as 0

^o

with posi- tive clockwise. The occlusion is assigned 0 for no-occlusion, 1 for sun glasses, 2 for mask, and 3 for hat. A few templates of these accessaries, collected from various sources, including AR database, are made available in this phase.

In summary, the IP phase leads to a PFD with each face parameterized in Ω

p

= [φ

i

, θ

j

, i

k

, o

l

, g

±

, e

m

, c

n

], where φ

_i

and θ

_j

specify pose-(i, j), i

k

for illumination-k, o

_l

for orientation-l, g

±

for gender, e

m

for expression-m, and c

n

for occlusion-n.

2.2. Extrinsic Parameterization and Spatial Distribution This phase is abbreviated as EPSD phase, and the ﬂowchart with the tasks involved is given in Fig. 2. Given a desired evaluation criterion, the faces with required intrinsic param- eters will be extracted from the PFD, and merged on back- ground images according to the required extrinsic parameters.

The extrinsic parameters include the size, number, and spatial distribution pattern of the faces, and the background images.

Parameterized Face Database (PFD)

Subset Sampling

Intrinsic parameters

Size and Spatial Distribution

Background Generation Optional

Perspective Modeling rinsic

meters

Size and Spatial Distribution

Optional Perspective Modeling

Pasting Process Face

Detection Benchmark

Dataset (FDBD)

Extrinsic parameters User-defined

Test Criterion

O

b p

Fig. 2. Flowchart of the Extrinsic Parameterization.

The faces selected from the PFD can be distributed ran- domly across a set of background images with a range of sizes and the number of each size speciﬁed in the evaluation crite- rion. However, for certain applications the 2D face distribu- tion on background images must reﬂect the distribution of 3D faces in a 3D scene. The framework, therefore, takes both into account, and is designed with the following two options:

Option-1: One can specify the total number of faces needed and the size range with desired distribution. Option-2: With

a chosen focal length, one can specify the number of faces locating in different distances from the camera. Prospective modeling with an assumed normal face/head model for each face is used to obtain its 3D projection onto a 2D image. Con- straints are imposed to verify the validity and realizability of a given speciﬁcation.

Background images can be cluttered or uncluttered, natu- ral or artiﬁcial, and the color and intensity of each background can be changed for different needs.

3. EXPERIMENTS AND DISCUSSION

Two scenarios are experimented to show the validity of the proposed framework. The ﬁrst shows that it can generate test images similar to those offered by the popular CMU/MIT face database. The second scenario shows that it can generate im- ages unavailable from most face detection databases.

Given an arbitrary image from the CMU/MIT database, the features on each face are ﬁrst marked manually, and the pose of each face is determined by the longitude and latitude determined using the marked facial features. Different poses have different sets of features for determining the longitudes and latitudes. From those with the same or similar poses in the PFD, only the ones with similar illumination, justiﬁed by the low-frequency DCT components, are selected as similar faces. Fig. 3 shows two typical samples. The originals from the CMU/MIT database are on the left, and with faces re- placed by similar ones from the PFD are shown on the right.

Out of the 346 images in the CMU/MIT database, our exper- iments show that 97.7% of the faces in these images can be replaced by those available from the PFD. A few with exces- sive expressions are considered unreplaceable.

Original Faces replaced

Fig. 3. Originals from the CMU/MIT database are on the left, and with faces replaced on the right.

3823

(4)

The second scenario is demonstrated in Figs. 4 and 5.

Fig. 4 are samples taken from two daughter sets with dif- ferent user-deﬁned criteria. The parameters for the left one include proﬁle pose ±90

^o

, lit on the right with ambient on, orientation between 0

^o

and 30

^o

, size between (0.2H)

²

and (0.3H)

²

, where H is the image’s height. The parameters for the right one include frontal pose, uniform illumination, ori- entation < 10

^o

, size varying from (0.08H)

²

to (0.3H)

²

suc- cessively with 10% occlusion on the smaller face behind. Fig.

5 shows a sample generated based on a desired 3D spatial dis- tribution of the faces, as revealed by the top view of the spatial distribution on the right. The samples in Figs. 4 and 5 are not readily available from existing face detection databases.

Fig. 4. Left with parameters: [Pose=±90

^o

, Illum=Right lit, Orien<±30

^o

, Size=(0.2∼0.3H)

²

], Right:[Pose=0

^o

, Il- lum=Even, Orien <±10

^o

, Size= (0.08∼0.3H)

²

].

D y

x

Fig. 5. Face are distributed according to a desired 3D spatial distribution, as designed on the right.

3.1. Discussion on Application Potentials

A couple issues about the application scope of the proposed framework deserve special attention:

Appropriate for scanning-window face detection: Most, if not all, face detectors are built with moving-windows scan- ning across the give image for locating facial candidate re- gions. Each candidate region is then veriﬁed by a classiﬁer for decision making. The daughter datasets serve well for benchmarking such region-based detectors.

Beyond detecting faces in real scenes: Human eyes de- tect faces regardless background, illumination, size, pose and other intrinsic and extrinsic variables. However with certain variables our eyes’ detection efﬁciency slows down and the detection rate deteriorates. The daughter datasets from the

framework are not just targeted at evaluating the detection of faces from real scenes, but also from artificial images, such as photos taken in a studio where artificial backgrounds are often used for various purposes. An excellent face detector is expected to work well in both artificial and natural scenes, yielding a similar performance as of human vision.

4. CONCLUSION

Different from all previously published databases for bench- marking face detection, the proposed framework takes the ad- vantage of a face recognition benchmark database, and gen- erates datasets that can meet the requirements of different test criteria. The generated datasets can be used to evaluate the performance with one single or multiple variables, and the scope of variation in each variable can be speciﬁed as required. This leads to a unique contribution: the perfor- mance speciﬁcation of a face detector can be quantitatively described, and the directions for improving a face detector can be precisely given. In addition, the ground truth in the generated datasets is determined at the time when the datasets are generated. There is no need for manual annotation. This can be the most user-friendly feature offered by the proposed framework for avoiding the most time-consuming manual an- notation of the ground truth in each test image.

5. REFERENCES

[1] M.-H. Yang, D.J. Kriegman, and N. Ahuja, “Detecting faces in images: A survey,” IEEE Trans. on PAMI, vol.

24, pp. 34–58, Jan. 2002.

[2] A.C. Loui, C.N. Judice, and S. Liu, “An image database for benchmarking of automatic face detection and recog- nition algorithms,” in IEEE Proc. Int. Conf. Image Pro- cessing, 1998, vol. 1, pp. 146–150.

[3] P.J. Philips, H. Moon, S.A. Rizvi, and P.J.Rauss, “The feret evaluation methodology for face recognition,” IEEE Trans. PAMI, vol. 22, pp. 1090–1104, Jan. 2000.

[4] A. Martinez and R. Benavente, “The ar face database,” in Tech. Report CVC 24. Purdue Univ., 1998.

[5] T. Sim, S. Baker, and M. Bsat, “The cmu pose, illumina- tion, and expression database,” IEEE Trans. PAMI, vol.

25, pp. 1615–1618, Dec. 2003.

[6] W. Gao, B. Cao, S. Shan, X. Chen, D. Zhou, X. Zhang, and D. Zhao, “The cas-peal large-scale chinese face database and baseline evaluations,” IEEE Trans. SMC- Part B, vol. 38, pp. 149–161, Jan. 2008.

Benchmark Face Detection using a Face Recognition Database