BENCHMARK FACE DETECTION USING A FACE RECOGNITION DATABASE Gee-Sern Hsu, Thu Ha Tran*, Sheng-Lun Chung*
National Taiwan University of Science and Technology
Department of Mechanical Engineering, *Department of Electrical Engineering, 43 Sec.4 Keelung Rd., Taipei, Taiwan
ABSTRACT
A framework is proposed to generate datasets good for bench- marking face detection using database meant for benchmark- ing face recognition. Instead of the common way of collect- ing images manually, the datasets from the proposed frame- work are made by a synthesis process with two phases: in- trinsic parameterization and extrinsic parameterization. The former parameterizes the intrinsic variables that affect the ap- pearance of a face, while the latter parameterizes the extrinsic variables that dominate how faces appear on background im- ages as required by a test criterion. Experiments reveal that the proposed framework can generate test samples similar to those available from a popular face detection database, and also samples unavailable from existing face databases.
Index Terms— Face detection, facial database, face recognition.
1. INTRODUCTION
Most face detection algorithms are evaluated with images col- lected from various sources, and each image has one or a few faces in it with ground truth annotated manually. The perfor- mance of a face detection algorithm is measured by the dif- ferences between the ground truth and those determined by the algorithm. It is commonly acknowledged that a good face detection algorithm should be able to detect faces of differ- ent sizes, orientations, poses, with occlusions, under various illumination conditions, and with locations anywhere in the image. It will be extremely exhausting, if not impossible, to collect test samples good enough to encompass most of these variables with a large scope of variation in each variable.
After the works by Sung and Poggio, and Rowley et al.
1(both works are reviewed in [1]), their collections of sam- ple images were considered as benchmark datasets for perfor- mance evaluation, and known as the CMU/MIT face database.
An extensive survey made by Yang et al. [1] revealed that the CMU/MIT database was the most popular one for per- formance test by 2002. We completed a survey that covered more than 20 face detection methods published from 2002 to 2009, and it showed that the CMU/MIT database was used in
1Both works can be found in IEEE Trans. PAMI, vol.20, no.1, Jan. 1998.
almost 60% of them, followed by 19% using databases made for evaluating face recognition. These surveys indicate that the majority of face detection algorithms use either one or both of the following two types of datasets for benchmarking:
Type-1: Test images are collected from various sources.
Because the collected faces appear to vary across poses, illu- minations, sizes, and other variables, one often tends to accept the validity of such datasets without looking into the scope of variation in the variables. Examples are the CMU/MIT database, Kodak face dataset [2], and many personal collec- tions which have not been released to the public.
Type-2: These are designed for benchmarking face recog- nition instead of face detection, for example, FERET [3], AR [4], PIE [5] databases and others. Most of them offer faces with many variables, and each variable covers a wide scope of variation. For example, the PIE database [5] offers 13 poses, 43 illumination conditions, and 3 expressions per subject.
Although many variables are covered in Type-1 databases, only a limited scope of variation in each variable is covered by the samples. A face detector that yields a high detection rate on one such dataset may not perform well in detecting faces with a specific test criterion, for example, faces of the same pose but in different sizes and illumination conditions.
Because Type-2 are designed for face recognition, most faces are similar in size, appear at a few fixed locations, and with one face per image, making them inappropriate for benchmarking face detection. However, most Type-2 offer a wider scope of variation in many intrinsic variable than that covered by Type-1.
This paper proposes a solution to the above issues with a 2-phase framework. In the first phase it takes in faces from a Type-2 database, known as the mother database, with in- trinsic variables as required by a test criterion. In the second phase it generates daughter datasets with extrinsic variables, also as required by the test criterion. The daughter datasets can be made similar to those used for a generic evaluation, or with some specific scope of variables for a special test.
The framework is presented in Section 2. To demonstrate its capacity in generating datasets larger than existing bench- mark databases, Section 3 presents experimental validations following two scenarios: one generates a daughter dataset close to the popular CMU/MIT database; the other generates
3821
978-1-4244-7994-8/10/$26.00 ©2010 IEEE ICIP 2010
Proceedings of 2010 IEEE 17th International Conference on Image Processing September 26-29, 2010, Hong Kong
daughter datasets unavailable from existing databases.
2. THE PROPOSED FRAMEWORK
The variables that affect the performance of a face detection algorithm can be split into two categories, intrinsic and ex- trinsic. The intrinsic variables can alter the appearance of a face, such as pose, expression, illumination, gender, and ac- cessaries as glasses and hats. The extrinsic variables deter- mine how faces appear in an image, for example, the size, number, and spatial distribution of the faces across the image.
The proposed framework is thus designed with two phases in it: the Intrinsic Parameterization (or IP for short), and the Extrinsic Parameterization and Spatial Distribution (EPSD).
The IP begins with cropping faces from a mother database, and winds up with a Parameterized Face Database (PFD), a large collection of parameterized faces. Given a test crite- rion, the EPSD selects the matched samples from the PFD and generate the required daughter datasets.
2.1. Intrinsic Parameterization
The ideal candidates for the mother database must include many intrinsic variables with a wide scope of variation in each variable. Pose and illumination are considered two ma- jor challenging variables in most applications, and if the num- ber of individuals and expressions are also taken into account, the PIE [5] and CAS-PEAL [6] can be the best candidates.
The PIE database is selected as an example in this study.
Parameterized Face Database (PFD)
Mother Database (Face Recognition Benchmark DB)
Cropped Faces
Pose Normalization
Pose
Illumination Clustering Sample Add-on
using Illumination Cone
Add
Subj_01 ~ Subj_Ns Pose_01 ~ Pose_Np Exp_01 ~ Exp_Ne Light_01 ~ light_Nl
S+ or S-
) , ( ) 1 , 1
( ~
p sN
GN
G
Pose Lighting
Parameter Space of Facial Variables Face tagged by
parameters
Intrinsic Parameterization P
Fig. 1. Flowchart of the Intrinsic Parameterization.
The IP’s flowchart is shown in Fig. 1 with three modules:
(1) size normalization and pose parameterization, (2) illumi- nation clustering, and (3) sample add-on using illumination cone [7] and parameterization of other intrinsic variables.
Size normalization and pose parameterization
The faces from the PIE database can be split into two major pose variation groups, one is in chin-up to chin-down
vertical pattern and the other in left-profile to right-profile horizontal pattern. The former can be normalized to D
h, the horizontal baseline between both eyes; and the latter can be normalized to D
v, the vertical baseline between the eyes and chin. Those with pose variation in both the vertical and horizontal directions, as the poses tagged with C25 and C31 in PIE, are normalized with tilted D
hand D
v. After size normalization, the faces are segmented and parameterized according to the pose with the same longitude and latitude.
The faces in vertical variations are parameterized into (φ
i,0), where φ
iis the longitude of the chin-up (or -down) pose mea- sured from the frontal, (0,0), and i = 1 or −1 for chin-up and chin-down, respectively. Those in horizontal variations are parameterized into (0, θ
j), where θ
jis the latitude of a sided pose, and j = ±1, ±2, ±3, ±4. The poses C25 and C31 are parameterized as (φ
−1, θ
−3) and (φ
−1, θ
3), respectively.
The parameterized poses are limited by the 13 pairs of (φ
i, θ
j) available from the PIE mother database. The illumi- nation cone [7] is applied to generate additional poses so that the PFD can have more poses to offer. An introduction to illumination cone will be given later in this section.
Illumination clustering and parameterization
The sample mother database, PIE, has defined 43 illumi- nation conditions. While it is possible to come up with 43 parameters for them, we reduce this number by clustering those with similar patterns. Most illumination conditions in PIE vary from side to side, instead of top to bottom. A su- pervised clustering scheme is applied to segment the illumi- nation into 10 clusters in each of the aforementioned 13 pose classes, and it is composed of the following steps – Step1: 10 different illumination conditions in each pose class are manu- ally selected as initial templates according to their visual pat- terns with reference to the flash system reported in the PIE’s setup [5]. Step2: the rest of the illumination conditions are clustered using EM upon the low frequency DCT coefficients extracted from the faces. Merged with the pose parameters, (φ
i, θ
j), each face from the mother database can now be pa- rameterized by (φ
i, θ
j, i
k), where i
kis the k-th illumination condition and k = 1, 2, ..., 10.
Illumination cone and other intrinsic parameters
Several approaches, for example the 3D morphable model and the illumination cone [7], use a few sample faces to gen- erate faces with different illumination conditions and poses.
The illumination cone is selected in this work for relatively inexpensive computation. It exploits the fact that a set of facial images with the same pose, but taken under different lighting conditions, is a convex cone in the image space. Us- ing a few training samples of each face under different illu- mination conditions, the shape and albedo of the face can be reconstructed using the Generalized Bas-Relief (GBR) trans- formation. This reconstruction leads to a generative model able to render or synthesize the images of the face under novel poses and illumination conditions. The requirement of train-
3822
ing images with the same pose but different illumination can be readily met by the samples from the mother database.
In addition to pose and illumination, expression, orien- tation (in-plane rotation), gender, and accessaries as glasses, hats and masks are also considered as intrinsic parameters.
Gender takes 1 for male, and 0 for female. Limited by the 3 expressions offered by the PIE mother database, the expres- sion parameters in the PFD can only be given 1 for smile, 0 for neutral, and −1 for blinking. The orientation can take
−180
oto 180
o, if upright pose is considered as 0
owith posi- tive clockwise. The occlusion is assigned 0 for no-occlusion, 1 for sun glasses, 2 for mask, and 3 for hat. A few templates of these accessaries, collected from various sources, including AR database, are made available in this phase.
In summary, the IP phase leads to a PFD with each face parameterized in Ω
p= [φ
i, θ
j, i
k, o
l, g
±, e
m, c
n], where φ
iand θ
jspecify pose-(i, j), i
kfor illumination-k, o
lfor orientation-l, g
±for gender, e
mfor expression-m, and c
nfor occlusion-n.
2.2. Extrinsic Parameterization and Spatial Distribution This phase is abbreviated as EPSD phase, and the flowchart with the tasks involved is given in Fig. 2. Given a desired evaluation criterion, the faces with required intrinsic param- eters will be extracted from the PFD, and merged on back- ground images according to the required extrinsic parameters.
The extrinsic parameters include the size, number, and spatial distribution pattern of the faces, and the background images.
Parameterized Face Database (PFD)
Subset Sampling
Intrinsic parameters
Size and Spatial Distribution
Background Generation Optional
Perspective Modeling rinsic
meters
Size and Spatial Distribution
Optional Perspective Modeling
Pasting Process Face
Detection Benchmark
Dataset (FDBD)
Extrinsic parameters User-defined
Test Criterion
O
b p
Fig. 2. Flowchart of the Extrinsic Parameterization.
The faces selected from the PFD can be distributed ran- domly across a set of background images with a range of sizes and the number of each size specified in the evaluation crite- rion. However, for certain applications the 2D face distribu- tion on background images must reflect the distribution of 3D faces in a 3D scene. The framework, therefore, takes both into account, and is designed with the following two options:
Option-1: One can specify the total number of faces needed and the size range with desired distribution. Option-2: With
a chosen focal length, one can specify the number of faces locating in different distances from the camera. Prospective modeling with an assumed normal face/head model for each face is used to obtain its 3D projection onto a 2D image. Con- straints are imposed to verify the validity and realizability of a given specification.
Background images can be cluttered or uncluttered, natu- ral or artificial, and the color and intensity of each background can be changed for different needs.
3. EXPERIMENTS AND DISCUSSION
Two scenarios are experimented to show the validity of the proposed framework. The first shows that it can generate test images similar to those offered by the popular CMU/MIT face database. The second scenario shows that it can generate im- ages unavailable from most face detection databases.
Given an arbitrary image from the CMU/MIT database, the features on each face are first marked manually, and the pose of each face is determined by the longitude and latitude determined using the marked facial features. Different poses have different sets of features for determining the longitudes and latitudes. From those with the same or similar poses in the PFD, only the ones with similar illumination, justified by the low-frequency DCT components, are selected as similar faces. Fig. 3 shows two typical samples. The originals from the CMU/MIT database are on the left, and with faces re- placed by similar ones from the PFD are shown on the right.
Out of the 346 images in the CMU/MIT database, our exper- iments show that 97.7% of the faces in these images can be replaced by those available from the PFD. A few with exces- sive expressions are considered unreplaceable.
Original Faces replaced
Original Faces replaced
Fig. 3. Originals from the CMU/MIT database are on the left, and with faces replaced on the right.
3823
The second scenario is demonstrated in Figs. 4 and 5.
Fig. 4 are samples taken from two daughter sets with dif- ferent user-defined criteria. The parameters for the left one include profile pose ±90
o, lit on the right with ambient on, orientation between 0
oand 30
o, size between (0.2H)
2and (0.3H)
2, where H is the image’s height. The parameters for the right one include frontal pose, uniform illumination, ori- entation < 10
o, size varying from (0.08H)
2to (0.3H)
2suc- cessively with 10% occlusion on the smaller face behind. Fig.
5 shows a sample generated based on a desired 3D spatial dis- tribution of the faces, as revealed by the top view of the spatial distribution on the right. The samples in Figs. 4 and 5 are not readily available from existing face detection databases.
Fig. 4. Left with parameters: [Pose=±90
o, Illum=Right lit, Orien<±30
o, Size=(0.2∼0.3H)
2], Right:[Pose=0
o, Il- lum=Even, Orien <±10
o, Size= (0.08∼0.3H)
2].
D y
x