Recent Advances in Face Detection
Ming Ming - - Hsuan Yang Hsuan Yang
myang@honda-ri.com
http://www.honda-ri.com http://vision.ai.uiuc.edu/mhyang
Honda Research Institute Honda Research Institute
Mountain View, California, USA
Mountain View, California, USA
Face Detection: A Solved Problem?
Recent results have Recent results have
demonstrated excellent demonstrated excellent
results results
fast, multi pose, fast, multi pose,
partial occlusion, … partial occlusion, …
So, is face detection a So, is face detection a solved problem?
solved problem?
No, not quite… No, not quite…
Omron’s face detector Omron’s face detector
[Liu et al. 04]
[Liu et al. 04]
Outline
Objective Objective
Survey major face detection works Survey major face detection works
Address “how” and “why” questions Address “how” and “why” questions
Pros and cons of detection methods Pros and cons of detection methods
Future research directions Future research directions
Updated tutorial material Updated tutorial material
http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html
Face Detection
Identify and locate Identify and locate
human faces in an image human faces in an image
regardless of their regardless of their
positionposition
scalescale
inin--plane rotationplane rotation
orientationorientation
pose (outpose (out--ofof--plane plane rotation)
rotation)
and illuminationand illumination Where are the faces, if any?
Why Face Detection is Important?
First step for any fully automatic face First step for any fully automatic face recognition system
recognition system
First step in many surveillance systems First step in many surveillance systems
Face is a highly non Face is a highly non - - rigid object rigid object
Lots of applications Lots of applications
A step towards Automatic Target Recognition A step towards Automatic Target Recognition
(ATR) or generic object detection/recognition
(ATR) or generic object detection/recognition
In One Thumbnail Face Image
Consider a thumbnail 19 Consider a thumbnail 19 × × 19 face pattern 19 face pattern
256 256
361361possible combination of gray values possible combination of gray values
256 256
361361= 2 = 2
8×8×361361= 2 = 2
28882888
Total world population (as of 2004) Total world population (as of 2004)
6,400,000,000 6,400,000,000 ≅ ≅ 2 2
3232
87 times more than the world population! 87 times more than the world population!
Extremely high dimensional space! Extremely high dimensional space!
Why Face Detection Is Difficult?
Pose (OutPose (Out--ofof--Plane Rotation)Plane Rotation)::
frontal, 45 degree, profile, upside down frontal, 45 degree, profile, upside down
Presence or absence of structural Presence or absence of structural components
components: beards, mustaches, and : beards, mustaches, and glasses
glasses
Facial expressionFacial expression: face appearance is : face appearance is directly affected by a person's facial
directly affected by a person's facial expression
expression
OcclusionOcclusion: faces may be partially occluded : faces may be partially occluded by other objects
by other objects
Orientation (InOrientation (In--Plane Rotation)Plane Rotation): : face appearance directly vary for different face appearance directly vary for different
rotations about the camera's optical axis rotations about the camera's optical axis
Imaging conditionsImaging conditions: lighting (spectra, : lighting (spectra, source distribution and intensity) and camera source distribution and intensity) and camera characteristics (sensor response, gain control, characteristics (sensor response, gain control,
lenses), resolution lenses), resolution
Related Problems
Face localization Face localization
:: Aim to determine the image position of a single faceAim to determine the image position of a single face
A simplified detection problem with the assumption that an A simplified detection problem with the assumption that an input image contains only one face
input image contains only one face
Facial feature extraction Facial feature extraction
::
To detect the presence and location of features such as eyes, To detect the presence and location of features such as eyes, nose, nostrils, eyebrow, mouth, lips, ears, etc
nose, nostrils, eyebrow, mouth, lips, ears, etc
Usually assume that there is only one face in an imageUsually assume that there is only one face in an image
Face recognition (identification) Face recognition (identification)
Facial expression recognition Facial expression recognition
Human pose estimation and tracking Human pose estimation and tracking
Face Detection and Object Recognition
Detection: concerns with a Detection: concerns with a category category of object of object
Recognition: concerns with Recognition: concerns with individual individual identity identity
Face is a highly non Face is a highly non - - rigid object rigid object
Many methods can be applied to other object Many methods can be applied to other object detection/recognition
detection/recognition
Car detection
Car detection Pedestrian detectionPedestrian detection
Human Detection and Tracking
Often used as a salient Often used as a salient cue for human detection cue for human detection
Used as a strong cue to Used as a strong cue to search for other body search for other body
parts parts
Used to detect new Used to detect new objects and re
objects and re - - initialize a initialize a tracker once it fails
tracker once it fails
[Lee and Cohen 04]
[Lee and Cohen 04] [Okuma et al. 04][Okuma et al. 04]
Research Issues
Representation: How to describe a typical face? Representation: How to describe a typical face?
Scale: How to deal with face of different size? Scale: How to deal with face of different size?
Search strategy: How to spot these faces? Search strategy: How to spot these faces?
Speed: How to speed up the process? Speed: How to speed up the process?
Precision: How to locate the faces precisely? Precision: How to locate the faces precisely?
Post processing: How to combine detection Post processing: How to combine detection results?
results?
Face Detector: Ingredients
Target application domain: single image, video Target application domain: single image, video
Representation: holistic, feature, holistic, etc Representation: holistic, feature, holistic, etc
Pre processing: histogram equalization, etc Pre processing: histogram equalization, etc
Cues: color, motion, depth, voice, etc Cues: color, motion, depth, voice, etc
Search strategy: exhaustive, greedy, focus of Search strategy: exhaustive, greedy, focus of attention, etc
attention, etc
Classifier design: ensemble, cascade Classifier design: ensemble, cascade
Post processing: combing detection results Post processing: combing detection results
In This Tutorial
Face Detection
Video Single Image
Color Gray Scale
Upright frontal
Color Gray Scale
Pose
Rotation
Occlusion Motion
Depth Voice
Focus on detecting
upright, frontal faces
in a single gray-scale image
with decent resolution
under good lighting conditions
See
[Sinha 01]for
detecting faces in
low-resolution
images
Methods to Detect/Locate Faces
KnowledgeKnowledge--based methodsbased methods::
Encode human knowledge of what constitutes a typical face Encode human knowledge of what constitutes a typical face (usually, the relationships between facial features)
(usually, the relationships between facial features)
Feature invariant approachesFeature invariant approaches::
Aim to find structural features of a face that exist even when the Aim to find structural features of a face that exist even when the pose, viewpoint, or lighting conditions vary
pose, viewpoint, or lighting conditions vary
Template matching methodsTemplate matching methods::
Several standard patterns stored to describe the face as a wholeSeveral standard patterns stored to describe the face as a whole or or the facial features separately
the facial features separately
AppearanceAppearance--based methodsbased methods::
The models (or templates) are learned from a set of training images The models (or templates) are learned from a set of training images which capture the representative variability of facial appearanc
which capture the representative variability of facial appearancee Many methods can be categorized in several ways
Many methods can be categorized in several ways
Agenda
Detecting faces in gray scale images Detecting faces in gray scale images
Knowledge Knowledge - - based based
Feature Feature - - based based
Template Template - - based based
Appearance Appearance - - based based
Detecting faces in color images Detecting faces in color images
Detecting faces in video Detecting faces in video
Performance evaluation Performance evaluation
Research direction and concluding remarks Research direction and concluding remarks
Knowledge-Based Methods
Top Top - - down approach: Represent a face using a down approach: Represent a face using a set of human
set of human - - coded rules coded rules
Example: Example:
The center part of face has uniform intensity values The center part of face has uniform intensity values
The difference between the average intensity values The difference between the average intensity values of the center part and the upper part is significant
of the center part and the upper part is significant
A face often appears with two eyes that are A face often appears with two eyes that are symmetric to each other, a nose and a mouth symmetric to each other, a nose and a mouth
Use these rules to guide the search process Use these rules to guide the search process
Knowledge-Based Method:
[Yang and Huang 94] MultiMulti--resolution focusresolution focus--ofof--attention attention approach
approach
Level 1 (lowest resolution):Level 1 (lowest resolution):
apply the rule “the center part of apply the rule “the center part of
the face has 4 cells with a the face has 4 cells with a
basically uniform intensity” to basically uniform intensity” to
search for candidates search for candidates
Level 2: local histogram Level 2: local histogram
equalization followed by edge equalization followed by edge
detection detection
Level 3: search for eye and mouth Level 3: search for eye and mouth features for validation
features for validation
Knowledge-Based Method :
[Kotropoulos & Pitas 94]
Horizontal/vertical projection to search for candidates Horizontal/vertical projection to search for candidates
Search eyebrow/eyes, nostrils/nose for validation Search eyebrow/eyes, nostrils/nose for validation
Difficult to detect multiple people or in complex Difficult to detect multiple people or in complex background
∑
∑
= ==
= m
x n
y
y x I y
VI y
x I x
HI
1 1
) , ( )
( )
, ( )
(
background
[Kotropoulos & Pitas 94]
Knowledge-based Methods: Summary
Pros:Pros:
Easy to come up with simple rules to describe the features Easy to come up with simple rules to describe the features of a face and their relationships
of a face and their relationships
Based on the coded rules, facial features in an input image Based on the coded rules, facial features in an input image are extracted first, and face candidates are identified
are extracted first, and face candidates are identified
Work well for face localization in uncluttered backgroundWork well for face localization in uncluttered background
Cons:Cons:
Difficult to translate human knowledge into rules precisely: Difficult to translate human knowledge into rules precisely:
detailed rules fail to detect faces and general rules may find detailed rules fail to detect faces and general rules may find many false positives
many false positives
Difficult to extend this approach to detect faces in different Difficult to extend this approach to detect faces in different poses: implausible to enumerate all the possible cases
poses: implausible to enumerate all the possible cases
Agenda
Detecting faces in gray scale images Detecting faces in gray scale images
Knowledge Knowledge - - based based
Feature Feature - - based based
Template Template - - based based
Appearance Appearance - - based based
Detecting faces in color images Detecting faces in color images
Detecting faces in video Detecting faces in video
Performance evaluation Performance evaluation
Research direction and concluding remarks Research direction and concluding remarks
Feature-Based Methods
Bottom Bottom - - up approach: Detect facial features up approach: Detect facial features (eyes, nose, mouth, etc) first
(eyes, nose, mouth, etc) first
Facial features: edge, intensity, shape, texture, Facial features: edge, intensity, shape, texture, color, etc
color, etc
Aim to detect invariant features Aim to detect invariant features
Group features into candidates and verify them Group features into candidates and verify them
Random Graph Matching
[Leung et al. 95] Formulate as a problem to find the correct Formulate as a problem to find the correct geometric arrangement of facial features geometric arrangement of facial features
Facial features are defined by the average Facial features are defined by the average responses of multi
responses of multi--orientation, multi-orientation, multi-scale scale Gaussian derivative filters
Gaussian derivative filters
Learn the configuration of features with Learn the configuration of features with Gaussian distribution of mutual distance Gaussian distribution of mutual distance
between facial features between facial features
Convolve an image with Gaussian filters to Convolve an image with Gaussian filters to locate candidate features based on similarity locate candidate features based on similarity
Random graph matching among the Random graph matching among the candidates to locate faces
candidates to locate faces
Feature Grouping
[Yow and Cipolla 90] Apply a 2Apply a 2ndnd derivative derivative
Gaussian filter to search for Gaussian filter to search for
interest points interest points
Group the edges near interest Group the edges near interest points into regions
points into regions
Each feature and grouping is Each feature and grouping is evaluated within a Bayesian evaluated within a Bayesian
network network
Handle a few posesHandle a few poses
See also See also [Amit et al. 97][Amit et al. 97] for for
efficient hierarchical (focus of efficient hierarchical (focus of
attention) feature
attention) feature--based based method
Face model and component
Model facial feature as pair of edges
Apply interest point operator and edge detector to search for features
Using Bayesian network to combine evidence
method
Feature-Based Methods: Summary
Pros: Pros:
Features are invariant to pose and orientation Features are invariant to pose and orientation change
change
Cons: Cons:
Difficult to locate facial features due to Difficult to locate facial features due to several corruption (illumination, noise, several corruption (illumination, noise,
occlusion) occlusion)
Difficult to detect features in complex Difficult to detect features in complex background
background
Agenda
Detecting faces in gray scale images Detecting faces in gray scale images
Knowledge Knowledge - - based based
Feature Feature - - based based
Template Template - - based based
Appearance Appearance - - based based
Detecting faces in color images Detecting faces in color images
Detecting faces in video Detecting faces in video
Performance evaluation Performance evaluation
Research direction and concluding remarks Research direction and concluding remarks
Template Matching Methods
Store a template Store a template
Predefined: based on edges or regions Predefined: based on edges or regions
Deformable: based on facial contours (e.g., Deformable: based on facial contours (e.g., Snakes)
Snakes)
Templates are hand Templates are hand - - coded (not learned) coded (not learned)
Use correlation to locate faces Use correlation to locate faces
Face Template
Use relative pair Use relative pair -wise ratios - wise ratios of the brightness of facial of the brightness of facial
regions (14
regions (14 × × 16 pixels): the 16 pixels): the eyes are usually darker than eyes are usually darker than
the surrounding face
the surrounding face
[Sinha 94][Sinha 94]
Use average area intensity Use average area intensity values than absolute pixel values than absolute pixel
values values
See also Point Distribution See also Point Distribution Model (PDM)
Model (PDM)
[Lanitis et al. 95]Ration Template [Sinha 94]
Ration Template [Sinha 94]
average shape average shape [Lanitis et al. 95]
[Lanitis et al. 95]
[Lanitis et al. 95]
Template-Based Methods: Summary
Pros: Pros:
Simple Simple
Cons: Cons:
Templates needs to be initialized near the Templates needs to be initialized near the face images
face images
Difficult to enumerate templates for different Difficult to enumerate templates for different poses (similar to knowledge
poses (similar to knowledge - - based methods) based methods)
Agenda
Detecting faces in gray scale images Detecting faces in gray scale images
Knowledge Knowledge - - based based
Feature Feature - - based based
Template Template - - based based
Appearance Appearance - - based based
Detecting faces in color images Detecting faces in color images
Detecting faces in video Detecting faces in video
Performance evaluation Performance evaluation
Research direction and concluding remarks Research direction and concluding remarks
Appearance-Based Methods
Train a classifier using positive (and usually Train a classifier using positive (and usually negative) examples of faces
negative) examples of faces
Representation Representation
Pre processing Pre processing
Train a classifier Train a classifier
Search strategy Search strategy
Post processing Post processing
View View - - based based
Appearance-Based Methods: Classifiers
Neural network: Multilayer Perceptrons Neural network: Multilayer Perceptrons
Princiapl Component Analysis (PCA), Factor Analysis Princiapl Component Analysis (PCA), Factor Analysis
Support vector machine (SVM)Support vector machine (SVM)
Mixture of PCA, Mixture of factor analyzersMixture of PCA, Mixture of factor analyzers
DistributionDistribution--based methodbased method
Naïve Bayes classifierNaïve Bayes classifier
Hidden Markov modelHidden Markov model
Sparse network of winnows (SNoW)Sparse network of winnows (SNoW)
Kullback relative informationKullback relative information
Inductive learning: C4.5Inductive learning: C4.5
Adaboost Adaboost
……
Representation
Holistic: Each image is raster scanned and Holistic: Each image is raster scanned and represented by a vector of intensity values represented by a vector of intensity values
Block Block - - based: Decompose each face image into based: Decompose each face image into a set of overlapping or non
a set of overlapping or non - - overlapping blocks overlapping blocks
At multiple scale At multiple scale
Further processed with vector quantization, Further processed with vector quantization, Principal Component Analysis, etc.
Principal Component Analysis, etc.
Face and Non-Face Exemplars
Positive examples: Positive examples:
Get as much variation as possibleGet as much variation as possible
Manually crop and normalize each Manually crop and normalize each face image into a standard size
face image into a standard size (e.g., 19
(e.g., 19 ×× 19 pixels)19 pixels)
Creating virtual examples Creating virtual examples [Sung and [Sung and Poggio 94]
Poggio 94]
Negative examples: Negative examples:
Fuzzy ideaFuzzy idea
Any images that do not contain Any images that do not contain faces
faces
A large image subspaceA large image subspace
Bootstraping Bootstraping [Sung and Poggio 94][Sung and Poggio 94]
Distribution-Based Method
[Sung & Poggio 94]
Masking: Masking:
reduce the unwanted reduce the unwanted background noise in a facebackground noise in a face pattern
pattern
Illumination gradient Illumination gradient correction:
correction:
find the best fit find the best fit brightness plane and then brightness plane and thensubtracted from it to reduce subtracted from it to reduce
heavy shadows caused by heavy shadows caused by
extreme lighting angles extreme lighting angles
Histogram equalization: Histogram equalization:
compensates the imaging effects compensates the imaging effects
due to changes in illumination due to changes in illumination
and different camera input gains and different camera input gains
Creating Virtual Positive Examples
Simple and very Simple and very effective method effective method
Randomly mirror, Randomly mirror, rotate, translate and rotate, translate and
scale face samples by scale face samples by
small amounts small amounts
Increase number of Increase number of training examples training examples
Less sensitive to Less sensitive to alignment error
Randomly mirrored, rotated Randomly mirrored, rotated translated, and scaled faces translated, and scaled faces
[Sung & Poggio 94]
alignment error
Distribution of Face/Non-face Pattern
Cluster face and non-Cluster face and non-face face samples into a few (i.e., 6) samples into a few (i.e., 6)
clusters using K
clusters using K--means means algorithm
algorithm
Each cluster is modeled by a Each cluster is modeled by a multi
multi--dimensional Gaussian with dimensional Gaussian with a centroid and covariance matrix a centroid and covariance matrix
Approximate each Gaussian Approximate each Gaussian
covariance with a subspace (i.e., covariance with a subspace (i.e.,
using the largest eigenvectors) using the largest eigenvectors)
See See [Moghaddam and Pentland 97][Moghaddam and Pentland 97]
on distribution
on distribution--based learning based learning using Gaussian mixture model
[Sung & Poggio 94]
⎭⎬
⎫
⎩⎨
⎧− − −
= ( ) − ( )
2 exp 1 )
2 ( ) 1
( 1/2 T 1
2
/ x µ Σ x µ
x Σ p d
π
xx: face, non-: face, non-face samplesface samples
using Gaussian mixture model
Distance Metrics
[Sung & Poggio 94] Compute distances of a sample to Compute distances of a sample to all the face and non
all the face and non-face clusters-face clusters
Each distance measure has two Each distance measure has two parts:
parts:
Within subspace distance (DWithin subspace distance (D11): ):
Mahalanobis distance of the projected Mahalanobis distance of the projected sample to cluster center
sample to cluster center
Distance to the subspace (Distance to the subspace (DD22): ):
distance of the sample to the subspace distance of the sample to the subspace
Feature vector: Each face/non-Feature vector: Each face/non-face face samples is represented by a vector samples is represented by a vector
of these distance measurements of these distance measurements
Train a multilayer perceptron Train a multilayer perceptron
using the feature vectors for face using the feature vectors for face
detection detection
6 face clusters6 face clusters
6 non-6 non-face clustersface clusters
2 distance values per cluster2 distance values per cluster
24 measurements24 measurements
T 2 75 75 2
2
1 T 1
) )(
( ) (
)) (
) (
|
| ln 2 ln 2( 1
µ x x
x
µ x Σ µ x Σ
−
−
=
−
=
−
− + +
= −
E E I D
d D
p
π
[Sung and Poggio 94]
Bootstrapping
[Sung and Poggio 94]1.1. Start with a small set of nonStart with a small set of non--face face examples in the training set
examples in the training set
2.2. Train a MLP classifier with the Train a MLP classifier with the current training set
current training set
3.3. Run the learned face detector on a Run the learned face detector on a sequence of random images.
sequence of random images.
4.4. Collect all the non-Collect all the non-face patterns face patterns that the current system wrongly that the current system wrongly classifies as faces (i.e., false classifies as faces (i.e., false positives)
positives)
5.5. Add these non-Add these non-face patterns to the face patterns to the training set
training set
6.6. Got to Step 2 or stop if satisfiedGot to Step 2 or stop if satisfied ÎÎ Improve the system performance Improve the system performance
greatly
Test image Test image
greatly False positive detectsFalse positive detects
Search over Space and Scale
Scan an input image at one
Scan an input image at one--pixel incrementspixel increments horizontally and vertically
Downsample the input image by Downsample the input image by a factor of 1.2 and continue to search horizontally and vertically a factor of 1.2 and continue to search
Continue to Search over Space and Scale
Continue to downsample the input image and search
until the image size is too small
Experimental Results:
[Sung and Poggio 94]
Can be have multiple Can be have multiple detects of a face since detects of a face since
it may be detected it may be detected
at different scale at different scale
at a slightly at a slightly
displaced window displaced window
location location
Able to detect upright Able to detect upright frontal faces
frontal faces
Neural Network-Based Detector
Train multiple multilayer perceptrons with different receptive Train multiple multilayer perceptrons with different receptive fields
fields [Rowley and Kanade 96][Rowley and Kanade 96]..
Merging the overlapping detections within one networkMerging the overlapping detections within one network
Train an arbitration network to combine the results from differeTrain an arbitration network to combine the results from different nt networks
networks
Needs to find the right neural network architecture (number of Needs to find the right neural network architecture (number of layers, hidden units, etc.) and parameters (learning rate, etc.) layers, hidden units, etc.) and parameters (learning rate, etc.)
Dealing with Multiple Detects
Merging overlapping Merging overlapping detections within one detections within one
network
network
[Rowley and Kanade [Rowley and Kanade 96]96]
Arbitration among Arbitration among multiple networks multiple networks
AND operator AND operator
OR operator OR operator
Voting Voting
Arbitration network
Merging overlapping results Merging overlapping results
Arbitration network
ANDing results from two networks ANDing results from two networks
Experimental Results:
[Rowley et al. 96]Detecting Rotated Faces
[Rowley et al. 98]
A router network is trained to estimate the angle of an A router network is trained to estimate the angle of an input window
input window
If it contain a face, the router returns the angle of the face If it contain a face, the router returns the angle of the face and the face can be rotated back to upright frontal position.
and the face can be rotated back to upright frontal position.
Otherwise the router returns a meaningless angleOtherwise the router returns a meaningless angle
The de The de - - rotated window is then applied to a detector rotated window is then applied to a detector (previously trained for upright frontal faces)
(previously trained for upright frontal faces)
Router Network
[Rowley et al. 98]
Rotate a face sample at 10 degree increment Rotate a face sample at 10 degree increment
Create virtual examples (translation and scaling) from Create virtual examples (translation and scaling) from each sample
each sample
Train a multilayer neural network with input Train a multilayer neural network with input - - output output pair pair
Input
Input--output pair to train a router networkoutput pair to train a router network
Experimental Results
[Rowley et al. 98] Able to detect rotated faces Able to detect rotated faces with good results
with good results
Performance degrades in Performance degrades in detecting upright frontal detecting upright frontal
faces due to the use of faces due to the use of
router network router network
See also [Feraud et al. 01]See also [Feraud et al. 01]
Support Vector Machine (SVM)
Find the optimal separating Find the optimal separating hyperplane constructed by hyperplane constructed by
support vectors
support vectors
[Vapnik 95][Vapnik 95]
Maximize distances between Maximize distances between the data points closest to the the data points closest to the separating hyperplane (large separating hyperplane (large
margin classifier) margin classifier)
Formulated as a quadratic Formulated as a quadratic programming problem
programming problem
Kernel functions for Kernel functions for nonlinear SVMs
support vector d
margin
nonlinear SVMs
SVM-Based Face Detector
[Osuna et al. 97]
Adopt similar architecture Adopt similar architecture Similar to
Similar to
[Sung and Poggio 94][Sung and Poggio 94]with the SVM classifier with the SVM classifier
Pros: Good recognition rate Pros: Good recognition rate with theoretical support
with theoretical support
Cons: Cons:
Time consuming in Time consuming in training and testing training and testing
Need to pick the right Need to pick the right kernel
[Osuna et al. 97]
kernel
SVM-Based Face Detector: Issues
Training: Solve a complex quadratic optimization Training: Solve a complex quadratic optimization problem
problem
SpeedSpeed--up: Sequential Minimal Optimization (SMO) up: Sequential Minimal Optimization (SMO) [Platt 99][Platt 99]
Testing: The number of support vectors may be large Testing: The number of support vectors may be large Æ Æ lots of kernel computations lots of kernel computations
SpeedSpeed--up: Reduced set of support vectors up: Reduced set of support vectors [Romdhani et al. 01][Romdhani et al. 01]
Variants: Variants:
ComponentComponent--based SVM based SVM [Heisele et al. 01]:[Heisele et al. 01]:
Learn components and their geometric configurationLearn components and their geometric configuration
Less sensitive to pose variationLess sensitive to pose variation
Sparse Network of Winnows
[Roth 98]
On line, mistake driven algorithm On line, mistake driven algorithm
Attribute (feature) efficiency Attribute (feature) efficiency
Allocations of nodes and links is data driven Allocations of nodes and links is data driven
complexity depends on number of active features complexity depends on number of active features
Allows for combining task hierarchically Allows for combining task hierarchically
Multiplicative learning rule Multiplicative learning rule
Target nodes
Features
SNoW-Based Face Detector
Multiplicative weight update algorithm: Multiplicative weight update algorithm:
Pros: On Pros: On - - line feature selection line feature selection
[Yang et al. 00][Yang et al. 00]
Cons: Need more powerful feature Cons: Need more powerful feature representation scheme
representation scheme
Also been applied to object recognition Also been applied to object recognition
[Yang et al. [Yang et al.02]02]
0.5 2,
Usually,
(demotion) 1)
x (if w w
, x
but w
0 Class
If
) (promotion
1) x
(if w w
, x
w but
1 Class
If
x w
iff 1 is Prediction
i i
i
i i
i
=
=
=
←
≥
•
=
=
←
≤
•
=
≥
•
β α
β α θ
θ
θ
Probabilistic Modeling of Local Appearance
[Schneiderman and Kanade 98] Using local appearanceUsing local appearance
Learn the distribution by Learn the distribution by parts using Naïve Bayes parts using Naïve Bayes
classifier classifier
Apply Bayesian decision Apply Bayesian decision rulerule
Further decompose the Further decompose the appearance into space, appearance into space,
frequency, and orientation frequency, and orientation
Learn the joint distribution Learn the joint distribution of object and position
of object and position
Also wavelet representationAlso wavelet representation
∏=
= n
k
k object subregion
p object
region p
1
)
| (
)
| (
pp( |face)=( |face)=
p( |face)*
p( |face)* p( |face)*p( |face)*
p( |face)*
p( |face)* p( |face)p( |face)
p( , x, y, s |face)*…
p( , x, y, s |face)*…
oror
p( , x, y, s |face)*…
p( , x, y, s |face)*…
) (
) (
)
| (
)
| (
object p
ojbect p
object region
P
object region
p > λ =
Detecting faces in Different Pose
Extend to detect faces in Extend to detect faces in different pose with
different pose with multiple detectors multiple detectors
Each detector specializes Each detector specializes to a view: frontal, left
to a view: frontal, left pose and right pose pose and right pose
[Mikolajczyk et al. 01][Mikolajczyk et al. 01]
extend extend to detect faces from side to detect faces from side
pose to frontal view pose to frontal view
[Schneiderman and Kanade 98]
Experimental Results
[Schneiderman and Kanade 98]Able to detect profile faces [Schneiderman and Kanade 98]
Extended to detect cars [Schneiderman and Kanade 00]
Mixture of Factor Analyzers
[Yang et al. 00] Generative method that performs Generative method that performs
clustering and dimensionality reduction clustering and dimensionality reduction
within each cluster within each cluster
Similar to probabilistic PCA but has Similar to probabilistic PCA but has more merits
more merits
proper density modelproper density model
robust to noiserobust to noise
Use mixture model to detect faces in Use mixture model to detect faces in different pose
different pose
Using EM to estimate all the parameters Using EM to estimate all the parameters in the mixture model
in the mixture model
See also See also [Moghaddam and Pentland 97][Moghaddam and Pentland 97] on on using probabilistic Gaussian mixture for using probabilistic Gaussian mixture for
object localization
zz
xx
) , (
)
|
(x z Λz Ψ
u Λz
x
N
p =
+
=
hidden factor ΛΛ
ΨΨ observation zz
xx
ΛΛjj, µ, µjj ΨΨ
ωω ππ
) , (
) ,
| (
Ψ z Λ z
x
j j
j
N p
+
= µ
ω
mixture model mixture model
object localization
Factor faces
Factor faces Factor faces Factor faces for 45°°
for frontal view
Fisher Linear Discriminant
[Yang et al. 00]
Fisherface (FLD) Fisherface (FLD)
demonstrated good results demonstrated good results
in face recognition in face recognition
Apply Self Apply Self - - Organizing Map Organizing Map (SOM) to cluster faces/non (SOM) to cluster faces/non - -
faces, and thereby labels for faces, and thereby labels for
samples samples
Apply FLD to find optimal Apply FLD to find optimal projection matrix for
projection matrix for maximal separation maximal separation
Estimate class Estimate class - - conditional conditional density for detection
Given a set of unlabeled face Given a set of unlabeled face and non
and non--face samplesface samples SOMSOM
Face/non
Face/non--face prototypes generated by SOM face prototypes generated by SOM FLDFLD
Class Conditional Density Class Conditional Density Maximum Likelihood Estimation Maximum Likelihood Estimation
density for detection
Adaboost
[Freund and Schapire 95] Use a set of weak classifiers (Use a set of weak classifiers (εt < 0.5) and weighting on ) and weighting on difficult examples for learning (sampling is based on the difficult examples for learning (sampling is based on the weights)
weights)
Given: (xGiven: (x11, y, y11), …, (x), …, (xmm, y, ymm) where x) where xii∈X∈X, , yyii∈Y∈Y={-={-1,+1}1,+1}
Initialize
Initialize DD11(i(i)=1/)=1/mm
For tFor t = 1, …, = 1, …, TT::
Train a weak classifier using distribution DTrain a weak classifier using distribution Dtt
1. Get a weak hypothesis ht: X Æ{-1,+1} with error εt=Pri~Dt[ht(xi)≠yi]
2. Importance of ht:αt=1/2 ln((1- εt)/ εt))
3.3. Update: Update: DDt+1t+1((ii)= )= DDtt((i)/i)/ZZtt×e×e-αt - if ht (x)=yi (correctly classified) DDt+1t+1((i)= i)= DDtt((i)/i)/ZZtt××eeαt if ht (x)≠yi (incorrectly classified) where
where ZZtt is a normalization factoris a normalization factor
Aggregating the classifiers: Aggregating the classifiers: H(x)=sign(H(x)=sign(ΣΣt=1 t=1 αt ht(x))
Perform well and does not overfit in empirical studiesPerform well and does not overfit in empirical studies
Adaboost-Based Detector
[Viola and Jones 01]
Main idea: Main idea:
Feature selection: select important features Feature selection: select important features
Focus of attention: focus on potential regions Focus of attention: focus on potential regions
Use an integral graph for fast feature evaluation Use an integral graph for fast feature evaluation
Use Adaboost to learn Use Adaboost to learn
A set of important features (feature selection) A set of important features (feature selection)
sort them in the order of importancesort them in the order of importance
each feature can be used as a simple (weak) classifiereach feature can be used as a simple (weak) classifier
A cascade of classifiers that A cascade of classifiers that
combine all the weak classifiers to do a difficult taskcombine all the weak classifiers to do a difficult task
filter out the regions that most likely do not contain facesfilter out the regions that most likely do not contain faces
Feature Selection
[Viola and Jones 01] Training: If x Training: If x is a face, then is a face, then xx
most likely has feature 1 (easiest feature, most likely has feature 1 (easiest feature, and of greatest importance)
and of greatest importance)
very likely to have feature 2 (easy very likely to have feature 2 (easy feature)
feature)
……
likely to have feature n (more complex likely to have feature n (more complex feature, and of less importance since it feature, and of less importance since it does not exist in all the faces in the does not exist in all the faces in the training set)
training set)
Testing: Given a test subTesting: Given a test sub--image x’image x’
if x’if x’ has feature 1:has feature 1:
Test whether x’Test whether x’ has feature 2has feature 2
•• Test whether x’Test whether x’ has feature nhas feature n –– ……
•• else …else …
else, it is not faceelse, it is not face
else, it is not a faceelse, it is not a face
Similar to decision tree
x’x’
YesYes NoNo
YesYes NoNo
YesYes NoNo
x’
x’ is a faceis a face feature 2 feature 2
feature feature nn
x’x’is a non-is a non-faceface
x’x’is a non-is a non-faceface feature 1
feature 1
x’x’is a nonis a non--faceface
Similar to decision tree One simple implementationOne simple implementation