Recent Advances in Face Detection

(1)

Recent Advances in Face Detection

Ming Ming - - Hsuan Yang Hsuan Yang

myang@honda-ri.com

http://www.honda-ri.com http://vision.ai.uiuc.edu/mhyang

Honda Research Institute Honda Research Institute

Mountain View, California, USA

(2)

Face Detection: A Solved Problem?

Recent results have Recent results have

demonstrated excellent demonstrated excellent

results results

fast, multi pose, fast, multi pose,

partial occlusion, … partial occlusion, …

So, is face detection a So, is face detection a solved problem?

solved problem?

No, not quite… No, not quite…

Omron’s face detector Omron’s face detector

[Liu et al. 04]

(3)

Outline

Objective Objective

Survey major face detection works Survey major face detection works

Address “how” and “why” questions Address “how” and “why” questions

Pros and cons of detection methods Pros and cons of detection methods

Future research directions Future research directions

Updated tutorial material Updated tutorial material

http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html

(4)

Face Detection

Identify and locate Identify and locate

human faces in an image human faces in an image

regardless of their regardless of their

positionposition

scalescale

inin--plane rotationplane rotation

orientationorientation

pose (outpose (out--ofof--plane plane rotation)

rotation)

and illuminationand illumination Where are the faces, if any?

(5)

Why Face Detection is Important?

First step for any fully automatic face First step for any fully automatic face recognition system

recognition system

First step in many surveillance systems First step in many surveillance systems

Face is a highly non Face is a highly non - - rigid object rigid object

Lots of applications Lots of applications

A step towards Automatic Target Recognition A step towards Automatic Target Recognition

(ATR) or generic object detection/recognition

(6)

In One Thumbnail Face Image

Consider a thumbnail 19 Consider a thumbnail 19 × × 19 face pattern 19 face pattern

256 256

³⁶¹³⁶¹

possible combination of gray values possible combination of gray values

256 256

³⁶¹³⁶¹

= 2 = 2

^8×⁸^×361³⁶¹

= 2 = 2

²⁸⁸⁸²⁸⁸⁸

Total world population (as of 2004) Total world population (as of 2004)

6,400,000,000 6,400,000,000 ≅ ≅ 2 2

³²³²

87 times more than the world population! 87 times more than the world population!

Extremely high dimensional space! Extremely high dimensional space!

(7)

Why Face Detection Is Difficult?

Pose (OutPose (Out--ofof--Plane Rotation)Plane Rotation)::

frontal, 45 degree, profile, upside down frontal, 45 degree, profile, upside down

Presence or absence of structural Presence or absence of structural components

components: beards, mustaches, and : beards, mustaches, and glasses

glasses

Facial expressionFacial expression: face appearance is : face appearance is directly affected by a person's facial

directly affected by a person's facial expression

expression

OcclusionOcclusion: faces may be partially occluded : faces may be partially occluded by other objects

by other objects

Orientation (InOrientation (In--Plane Rotation)Plane Rotation): : face appearance directly vary for different face appearance directly vary for different

rotations about the camera's optical axis rotations about the camera's optical axis

Imaging conditionsImaging conditions: lighting (spectra, : lighting (spectra, source distribution and intensity) and camera source distribution and intensity) and camera characteristics (sensor response, gain control, characteristics (sensor response, gain control,

lenses), resolution lenses), resolution

(8)

Face localization Face localization

::

Aim to determine the image position of a single faceAim to determine the image position of a single face

A simplified detection problem with the assumption that an A simplified detection problem with the assumption that an input image contains only one face

input image contains only one face

Facial feature extraction Facial feature extraction

::

To detect the presence and location of features such as eyes, To detect the presence and location of features such as eyes, nose, nostrils, eyebrow, mouth, lips, ears, etc

nose, nostrils, eyebrow, mouth, lips, ears, etc

Usually assume that there is only one face in an imageUsually assume that there is only one face in an image

Face recognition (identification) Face recognition (identification)

Facial expression recognition Facial expression recognition

Human pose estimation and tracking Human pose estimation and tracking

(9)

Face Detection and Object Recognition

Detection: concerns with a Detection: concerns with a category category of object of object

Recognition: concerns with Recognition: concerns with individual individual identity identity

Face is a highly non Face is a highly non - - rigid object rigid object

Many methods can be applied to other object Many methods can be applied to other object detection/recognition

detection/recognition

Car detection

Car detection Pedestrian detectionPedestrian detection

(10)

Human Detection and Tracking

Often used as a salient Often used as a salient cue for human detection cue for human detection

Used as a strong cue to Used as a strong cue to search for other body search for other body

parts parts

Used to detect new Used to detect new objects and re

objects and re - - initialize a initialize a tracker once it fails

tracker once it fails

[Lee and Cohen 04]

[Lee and Cohen 04] [Okuma et al. 04][Okuma et al. 04]

(11)

Research Issues

Representation: How to describe a typical face? Representation: How to describe a typical face?

Scale: How to deal with face of different size? Scale: How to deal with face of different size?

Search strategy: How to spot these faces? Search strategy: How to spot these faces?

Speed: How to speed up the process? Speed: How to speed up the process?

Precision: How to locate the faces precisely? Precision: How to locate the faces precisely?

Post processing: How to combine detection Post processing: How to combine detection results?

results?

(12)

Face Detector: Ingredients

Target application domain: single image, video Target application domain: single image, video

Representation: holistic, feature, holistic, etc Representation: holistic, feature, holistic, etc

Pre processing: histogram equalization, etc Pre processing: histogram equalization, etc

Cues: color, motion, depth, voice, etc Cues: color, motion, depth, voice, etc

Search strategy: exhaustive, greedy, focus of Search strategy: exhaustive, greedy, focus of attention, etc

attention, etc

Classifier design: ensemble, cascade Classifier design: ensemble, cascade

Post processing: combing detection results Post processing: combing detection results

(13)

In This Tutorial

Face Detection

Video Single Image

Color Gray Scale

Upright frontal

Color Gray Scale

Pose

Rotation

Occlusion Motion

Depth Voice

Focus on detecting

upright, frontal faces

in a single gray-scale image

with decent resolution

under good lighting conditions

See

^{[Sinha 01]}

for

detecting faces in

low-resolution

images

(14)

Methods to Detect/Locate Faces

KnowledgeKnowledge--based methodsbased methods::

Encode human knowledge of what constitutes a typical face Encode human knowledge of what constitutes a typical face (usually, the relationships between facial features)

(usually, the relationships between facial features)

Feature invariant approachesFeature invariant approaches::

Aim to find structural features of a face that exist even when the Aim to find structural features of a face that exist even when the pose, viewpoint, or lighting conditions vary

pose, viewpoint, or lighting conditions vary

Template matching methodsTemplate matching methods::

Several standard patterns stored to describe the face as a wholeSeveral standard patterns stored to describe the face as a whole or or the facial features separately

the facial features separately

AppearanceAppearance--based methodsbased methods::

The models (or templates) are learned from a set of training images The models (or templates) are learned from a set of training images which capture the representative variability of facial appearanc

which capture the representative variability of facial appearancee Many methods can be categorized in several ways

Many methods can be categorized in several ways

(15)

Agenda

Detecting faces in gray scale images Detecting faces in gray scale images

Knowledge Knowledge - - based based

Feature Feature - - based based

Template Template - - based based

Appearance Appearance - - based based

Detecting faces in color images Detecting faces in color images

Detecting faces in video Detecting faces in video

Performance evaluation Performance evaluation

Research direction and concluding remarks Research direction and concluding remarks

(16)

Knowledge-Based Methods

Top Top - - down approach: Represent a face using a down approach: Represent a face using a set of human

set of human - - coded rules coded rules

Example: Example:

The center part of face has uniform intensity values The center part of face has uniform intensity values

The difference between the average intensity values The difference between the average intensity values of the center part and the upper part is significant

of the center part and the upper part is significant

A face often appears with two eyes that are A face often appears with two eyes that are symmetric to each other, a nose and a mouth symmetric to each other, a nose and a mouth

Use these rules to guide the search process Use these rules to guide the search process

(17)

Knowledge-Based Method:

[Yang and Huang 94]

MultiMulti--resolution focusresolution focus--ofof--attention attention approach

approach

Level 1 (lowest resolution):Level 1 (lowest resolution):

apply the rule “the center part of apply the rule “the center part of

the face has 4 cells with a the face has 4 cells with a

basically uniform intensity” to basically uniform intensity” to

search for candidates search for candidates

Level 2: local histogram Level 2: local histogram

equalization followed by edge equalization followed by edge

detection detection

Level 3: search for eye and mouth Level 3: search for eye and mouth features for validation

features for validation

(18)

Knowledge-Based Method :

[Kotropoulos & Pitas 94]

Horizontal/vertical projection to search for candidates Horizontal/vertical projection to search for candidates

Search eyebrow/eyes, nostrils/nose for validation Search eyebrow/eyes, nostrils/nose for validation

Difficult to detect multiple people or in complex Difficult to detect multiple people or in complex background

∑

= =

=

= ^m

x n

y

y x I y

VI y

x I x

HI

1 1

) , ( )

( )

, ( )

(

background

[Kotropoulos & Pitas 94]

(19)

Knowledge-based Methods: Summary

Pros:Pros:

Easy to come up with simple rules to describe the features Easy to come up with simple rules to describe the features of a face and their relationships

of a face and their relationships

Based on the coded rules, facial features in an input image Based on the coded rules, facial features in an input image are extracted first, and face candidates are identified

are extracted first, and face candidates are identified

Work well for face localization in uncluttered backgroundWork well for face localization in uncluttered background

Cons:Cons:

Difficult to translate human knowledge into rules precisely: Difficult to translate human knowledge into rules precisely:

detailed rules fail to detect faces and general rules may find detailed rules fail to detect faces and general rules may find many false positives

many false positives

Difficult to extend this approach to detect faces in different Difficult to extend this approach to detect faces in different poses: implausible to enumerate all the possible cases

poses: implausible to enumerate all the possible cases

(20)

Agenda

Detecting faces in gray scale images Detecting faces in gray scale images

Knowledge Knowledge - - based based

Feature Feature - - based based

Template Template - - based based

Appearance Appearance - - based based

Detecting faces in color images Detecting faces in color images

Detecting faces in video Detecting faces in video

Performance evaluation Performance evaluation

Research direction and concluding remarks Research direction and concluding remarks

(21)

Feature-Based Methods

Bottom Bottom - - up approach: Detect facial features up approach: Detect facial features (eyes, nose, mouth, etc) first

(eyes, nose, mouth, etc) first

Facial features: edge, intensity, shape, texture, Facial features: edge, intensity, shape, texture, color, etc

color, etc

Aim to detect invariant features Aim to detect invariant features

Group features into candidates and verify them Group features into candidates and verify them

(22)

Random Graph Matching

[Leung et al. 95]

Formulate as a problem to find the correct Formulate as a problem to find the correct geometric arrangement of facial features geometric arrangement of facial features

Facial features are defined by the average Facial features are defined by the average responses of multi

responses of multi--orientation, multi-orientation, multi-scale scale Gaussian derivative filters

Gaussian derivative filters

Learn the configuration of features with Learn the configuration of features with Gaussian distribution of mutual distance Gaussian distribution of mutual distance

between facial features between facial features

Convolve an image with Gaussian filters to Convolve an image with Gaussian filters to locate candidate features based on similarity locate candidate features based on similarity

Random graph matching among the Random graph matching among the candidates to locate faces

candidates to locate faces

(23)

Feature Grouping

[Yow and Cipolla 90]

Apply a 2Apply a 2^nd^nd derivative derivative

Gaussian filter to search for Gaussian filter to search for

interest points interest points

Group the edges near interest Group the edges near interest points into regions

points into regions

Each feature and grouping is Each feature and grouping is evaluated within a Bayesian evaluated within a Bayesian

network network

Handle a few posesHandle a few poses

See also See also [Amit et al. 97][Amit et al. 97] for for

efficient hierarchical (focus of efficient hierarchical (focus of

attention) feature

attention) feature--based based method

Face model and component

Model facial feature as pair of edges

Apply interest point operator and edge detector to search for features

Using Bayesian network to combine evidence

method

(24)

Feature-Based Methods: Summary

Pros: Pros:

Features are invariant to pose and orientation Features are invariant to pose and orientation change

change

Cons: Cons:

Difficult to locate facial features due to Difficult to locate facial features due to several corruption (illumination, noise, several corruption (illumination, noise,

occlusion) occlusion)

Difficult to detect features in complex Difficult to detect features in complex background

background

(25)

Agenda

Detecting faces in gray scale images Detecting faces in gray scale images

Knowledge Knowledge - - based based

Feature Feature - - based based

Template Template - - based based

Appearance Appearance - - based based

Detecting faces in color images Detecting faces in color images

Detecting faces in video Detecting faces in video

Performance evaluation Performance evaluation

Research direction and concluding remarks Research direction and concluding remarks

(26)

Template Matching Methods

Store a template Store a template

Predefined: based on edges or regions Predefined: based on edges or regions

Deformable: based on facial contours (e.g., Deformable: based on facial contours (e.g., Snakes)

Snakes)

Templates are hand Templates are hand - - coded (not learned) coded (not learned)

Use correlation to locate faces Use correlation to locate faces

(27)

Face Template

Use relative pair Use relative pair -wise ratios - wise ratios of the brightness of facial of the brightness of facial

regions (14

regions (14 × × 16 pixels): the 16 pixels): the eyes are usually darker than eyes are usually darker than

the surrounding face

^{[Sinha 94]}^{[Sinha 94]}

Use average area intensity Use average area intensity values than absolute pixel values than absolute pixel

values values

Model (PDM)

[Lanitis et al. 95]

Ration Template [Sinha 94]

average shape average shape [Lanitis et al. 95]

[Lanitis et al. 95]

(28)

Template-Based Methods: Summary

Pros: Pros:

Simple Simple

Cons: Cons:

Templates needs to be initialized near the Templates needs to be initialized near the face images

face images

Difficult to enumerate templates for different Difficult to enumerate templates for different poses (similar to knowledge

poses (similar to knowledge - - based methods) based methods)

(29)

Agenda

Detecting faces in gray scale images Detecting faces in gray scale images

Knowledge Knowledge - - based based

Feature Feature - - based based

Template Template - - based based

Appearance Appearance - - based based

Detecting faces in color images Detecting faces in color images

Detecting faces in video Detecting faces in video

Performance evaluation Performance evaluation

Research direction and concluding remarks Research direction and concluding remarks

(30)

Appearance-Based Methods

Train a classifier using positive (and usually Train a classifier using positive (and usually negative) examples of faces

negative) examples of faces

Representation Representation

Pre processing Pre processing

Train a classifier Train a classifier

Search strategy Search strategy

Post processing Post processing

View View - - based based

(31)

Appearance-Based Methods: Classifiers

Neural network: Multilayer Perceptrons Neural network: Multilayer Perceptrons

Princiapl Component Analysis (PCA), Factor Analysis Princiapl Component Analysis (PCA), Factor Analysis

Support vector machine (SVM)Support vector machine (SVM)

Mixture of PCA, Mixture of factor analyzersMixture of PCA, Mixture of factor analyzers

DistributionDistribution--based methodbased method

Naïve Bayes classifierNaïve Bayes classifier

Hidden Markov modelHidden Markov model

Sparse network of winnows (SNoW)Sparse network of winnows (SNoW)

Kullback relative informationKullback relative information

Inductive learning: C4.5Inductive learning: C4.5

Adaboost Adaboost

……

(32)

Representation

Holistic: Each image is raster scanned and Holistic: Each image is raster scanned and represented by a vector of intensity values represented by a vector of intensity values

Block Block - - based: Decompose each face image into based: Decompose each face image into a set of overlapping or non

a set of overlapping or non - - overlapping blocks overlapping blocks

At multiple scale At multiple scale

Further processed with vector quantization, Further processed with vector quantization, Principal Component Analysis, etc.

Principal Component Analysis, etc.

(33)

Face and Non-Face Exemplars

Positive examples: Positive examples:

Get as much variation as possibleGet as much variation as possible

Manually crop and normalize each Manually crop and normalize each face image into a standard size

face image into a standard size (e.g., 19

(e.g., 19 ×× 19 pixels)19 pixels)

Creating virtual examples Creating virtual examples [Sung and [Sung and Poggio 94]

Poggio 94]

Negative examples: Negative examples:

Fuzzy ideaFuzzy idea

Any images that do not contain Any images that do not contain faces

faces

A large image subspaceA large image subspace

Bootstraping Bootstraping [Sung and Poggio 94][Sung and Poggio 94]

(34)

Distribution-Based Method

[Sung & Poggio 94]

Masking: Masking:

reduce the unwanted reduce the unwanted background noise in a face

background noise in a face pattern

pattern

Illumination gradient Illumination gradient correction:

correction:

find the best fit find the best fit brightness plane and then brightness plane and then

subtracted from it to reduce subtracted from it to reduce

heavy shadows caused by heavy shadows caused by

extreme lighting angles extreme lighting angles

Histogram equalization: Histogram equalization:

compensates the imaging effects compensates the imaging effects

due to changes in illumination due to changes in illumination

and different camera input gains and different camera input gains

(35)

Creating Virtual Positive Examples

Simple and very Simple and very effective method effective method

Randomly mirror, Randomly mirror, rotate, translate and rotate, translate and

scale face samples by scale face samples by

small amounts small amounts

Increase number of Increase number of training examples training examples

Less sensitive to Less sensitive to alignment error

Randomly mirrored, rotated Randomly mirrored, rotated translated, and scaled faces translated, and scaled faces

[Sung & Poggio 94]

alignment error

(36)

Distribution of Face/Non-face Pattern

Cluster face and non-Cluster face and non-face face samples into a few (i.e., 6) samples into a few (i.e., 6)

clusters using K

clusters using K--means means algorithm

algorithm

Each cluster is modeled by a Each cluster is modeled by a multi

multi--dimensional Gaussian with dimensional Gaussian with a centroid and covariance matrix a centroid and covariance matrix

Approximate each Gaussian Approximate each Gaussian

covariance with a subspace (i.e., covariance with a subspace (i.e.,

using the largest eigenvectors) using the largest eigenvectors)

See See [Moghaddam and Pentland 97][Moghaddam and Pentland 97]

on distribution

on distribution--based learning based learning using Gaussian mixture model

[Sung & Poggio 94]

⎭⎬

⎫

⎩⎨

⎧− − −

= ( ) ⁻ ( )

2 exp 1 )

2 ( ) 1

( ₁_/₂ ^T ¹

2

/ x µ Σ x µ

x Σ p d

π

xx: face, non-: face, non-face samplesface samples

using Gaussian mixture model

(37)

Distance Metrics

[Sung & Poggio 94]

Compute distances of a sample to Compute distances of a sample to all the face and non

all the face and non-face clusters-face clusters

Each distance measure has two Each distance measure has two parts:

parts:

Within subspace distance (DWithin subspace distance (D₁₁): ):

Mahalanobis distance of the projected Mahalanobis distance of the projected sample to cluster center

sample to cluster center

Distance to the subspace (Distance to the subspace (DD₂₂): ):

distance of the sample to the subspace distance of the sample to the subspace

Feature vector: Each face/non-Feature vector: Each face/non-face face samples is represented by a vector samples is represented by a vector

of these distance measurements of these distance measurements

Train a multilayer perceptron Train a multilayer perceptron

using the feature vectors for face using the feature vectors for face

detection detection

6 face clusters6 face clusters

6 non-6 non-face clustersface clusters

2 distance values per cluster2 distance values per cluster

24 measurements24 measurements

T 2 75 75 2

2

1 T 1

) )(

( ) (

)) (

) (

|

| ln 2 ln 2( 1

µ x x

x

µ x Σ µ x Σ

−

=

−

=

−

− + +

= ⁻

E E I D

d D

p

π

[Sung and Poggio 94]

(38)

Bootstrapping

1.1. Start with a small set of nonStart with a small set of non--face face examples in the training set

examples in the training set

2.2. Train a MLP classifier with the Train a MLP classifier with the current training set

current training set

3.3. Run the learned face detector on a Run the learned face detector on a sequence of random images.

sequence of random images.

4.4. Collect all the non-Collect all the non-face patterns face patterns that the current system wrongly that the current system wrongly classifies as faces (i.e., false classifies as faces (i.e., false positives)

positives)

5.5. Add these non-Add these non-face patterns to the face patterns to the training set

training set

6.6. Got to Step 2 or stop if satisfiedGot to Step 2 or stop if satisfied ÎÎ Improve the system performance Improve the system performance

greatly

Test image Test image

greatly False positive detectsFalse positive detects

(39)

Search over Space and Scale

Scan an input image at one

Scan an input image at one--pixel incrementspixel increments horizontally and vertically

Downsample the input image by Downsample the input image by a factor of 1.2 and continue to search horizontally and vertically a factor of 1.2 and continue to search

(40)

Continue to Search over Space and Scale

Continue to downsample the input image and search

until the image size is too small

(41)

Experimental Results:

Can be have multiple Can be have multiple detects of a face since detects of a face since

it may be detected it may be detected

at different scale at different scale

at a slightly at a slightly

displaced window displaced window

location location

Able to detect upright Able to detect upright frontal faces

frontal faces

(42)

Neural Network-Based Detector

Train multiple multilayer perceptrons with different receptive Train multiple multilayer perceptrons with different receptive fields

fields [Rowley and Kanade 96][Rowley and Kanade 96]..

Merging the overlapping detections within one networkMerging the overlapping detections within one network

Train an arbitration network to combine the results from differeTrain an arbitration network to combine the results from different nt networks

networks

Needs to find the right neural network architecture (number of Needs to find the right neural network architecture (number of layers, hidden units, etc.) and parameters (learning rate, etc.) layers, hidden units, etc.) and parameters (learning rate, etc.)

(43)

Dealing with Multiple Detects

Merging overlapping Merging overlapping detections within one detections within one

network

[Rowley and Kanade [Rowley and Kanade 96]96]

Arbitration among Arbitration among multiple networks multiple networks

AND operator AND operator

OR operator OR operator

Voting Voting

Arbitration network

Merging overlapping results Merging overlapping results

Arbitration network

ANDing results from two networks ANDing results from two networks

(44)

Experimental Results:

[Rowley et al. 96]

(45)

Detecting Rotated Faces

[Rowley et al. 98]

A router network is trained to estimate the angle of an A router network is trained to estimate the angle of an input window

input window

If it contain a face, the router returns the angle of the face If it contain a face, the router returns the angle of the face and the face can be rotated back to upright frontal position.

and the face can be rotated back to upright frontal position.

Otherwise the router returns a meaningless angleOtherwise the router returns a meaningless angle

The de The de - - rotated window is then applied to a detector rotated window is then applied to a detector (previously trained for upright frontal faces)

(previously trained for upright frontal faces)

(46)

Router Network

[Rowley et al. 98]

Rotate a face sample at 10 degree increment Rotate a face sample at 10 degree increment

Create virtual examples (translation and scaling) from Create virtual examples (translation and scaling) from each sample

each sample

Train a multilayer neural network with input Train a multilayer neural network with input - - output output pair pair

Input

Input--output pair to train a router networkoutput pair to train a router network

(47)

Experimental Results

[Rowley et al. 98]

Able to detect rotated faces Able to detect rotated faces with good results

with good results

Performance degrades in Performance degrades in detecting upright frontal detecting upright frontal

faces due to the use of faces due to the use of

router network router network

See also [Feraud et al. 01]See also [Feraud et al. 01]

(48)

Support Vector Machine (SVM)

Find the optimal separating Find the optimal separating hyperplane constructed by hyperplane constructed by

support vectors

[Vapnik 95][Vapnik 95]

Maximize distances between Maximize distances between the data points closest to the the data points closest to the separating hyperplane (large separating hyperplane (large

margin classifier) margin classifier)

Formulated as a quadratic Formulated as a quadratic programming problem

programming problem

Kernel functions for Kernel functions for nonlinear SVMs

support vector d

margin

nonlinear SVMs

(49)

SVM-Based Face Detector

[Osuna et al. 97]

Adopt similar architecture Adopt similar architecture Similar to

Similar to

[Sung and Poggio 94][Sung and Poggio 94]

with the SVM classifier with the SVM classifier

Pros: Good recognition rate Pros: Good recognition rate with theoretical support

with theoretical support

Cons: Cons:

Time consuming in Time consuming in training and testing training and testing

Need to pick the right Need to pick the right kernel

[Osuna et al. 97]

kernel

(50)

SVM-Based Face Detector: Issues

Training: Solve a complex quadratic optimization Training: Solve a complex quadratic optimization problem

problem

SpeedSpeed--up: Sequential Minimal Optimization (SMO) up: Sequential Minimal Optimization (SMO) [Platt 99][Platt 99]

Testing: The number of support vectors may be large Testing: The number of support vectors may be large Æ Æ lots of kernel computations lots of kernel computations

SpeedSpeed--up: Reduced set of support vectors up: Reduced set of support vectors [Romdhani et al. 01][Romdhani et al. 01]

Variants: Variants:

ComponentComponent--based SVM based SVM [Heisele et al. 01]:[Heisele et al. 01]:

Learn components and their geometric configurationLearn components and their geometric configuration

Less sensitive to pose variationLess sensitive to pose variation

(51)

Sparse Network of Winnows

^{[Roth 98]}

On line, mistake driven algorithm On line, mistake driven algorithm

Attribute (feature) efficiency Attribute (feature) efficiency

Allocations of nodes and links is data driven Allocations of nodes and links is data driven

complexity depends on number of active features complexity depends on number of active features

Allows for combining task hierarchically Allows for combining task hierarchically

Multiplicative learning rule Multiplicative learning rule

Target nodes

Features

(52)

SNoW-Based Face Detector

Multiplicative weight update algorithm: Multiplicative weight update algorithm:

Pros: On Pros: On - - line feature selection line feature selection

[Yang et al. 00][Yang et al. 00]

Cons: Need more powerful feature Cons: Need more powerful feature representation scheme

representation scheme

Also been applied to object recognition Also been applied to object recognition

[Yang et al. [Yang et al.

02]02]

0.5 2,

Usually,

(demotion) 1)

x (if w w

, x

but w

0 Class

If

) (promotion

1) x

(if w w

, x

w but

1 Class

If

x w

iff 1 is Prediction

i i

i

i i

i

=

←

≥

•

=

←

≤

•

=

≥

•

β α

β α θ

θ

(53)

Probabilistic Modeling of Local Appearance

[Schneiderman and Kanade 98]

Using local appearanceUsing local appearance

Learn the distribution by Learn the distribution by parts using Naïve Bayes parts using Naïve Bayes

classifier classifier

Apply Bayesian decision Apply Bayesian decision rulerule

Further decompose the Further decompose the appearance into space, appearance into space,

frequency, and orientation frequency, and orientation

Learn the joint distribution Learn the joint distribution of object and position

of object and position

Also wavelet representationAlso wavelet representation

∏=

= ⁿ

k

k object subregion

p object

region p

1

)

| (

)

| (

pp( |face)=( |face)=

p( |face)*

p( |face)* p( |face)*p( |face)*

p( |face)*

p( |face)* p( |face)p( |face)

p( , x, y, s |face)*…

oror

p( , x, y, s |face)*…

) (

)

| (

)

| (

object p

ojbect p

object region

P

object region

p > λ =

(54)

Detecting faces in Different Pose

Extend to detect faces in Extend to detect faces in different pose with

different pose with multiple detectors multiple detectors

Each detector specializes Each detector specializes to a view: frontal, left

to a view: frontal, left pose and right pose pose and right pose

[Mikolajczyk et al. 01][Mikolajczyk et al. 01]

extend extend to detect faces from side to detect faces from side

pose to frontal view pose to frontal view

(55)

Experimental Results

Able to detect profile faces [Schneiderman and Kanade 98]

Extended to detect cars [Schneiderman and Kanade 00]

(56)

Mixture of Factor Analyzers

[Yang et al. 00]

Generative method that performs Generative method that performs

clustering and dimensionality reduction clustering and dimensionality reduction

within each cluster within each cluster

Similar to probabilistic PCA but has Similar to probabilistic PCA but has more merits

more merits

proper density modelproper density model

robust to noiserobust to noise

Use mixture model to detect faces in Use mixture model to detect faces in different pose

different pose

Using EM to estimate all the parameters Using EM to estimate all the parameters in the mixture model

in the mixture model

See also See also [Moghaddam and Pentland 97][Moghaddam and Pentland 97] on on using probabilistic Gaussian mixture for using probabilistic Gaussian mixture for

object localization

zz

xx

) , (

)

|

(x z Λz Ψ

u Λz

x

N

p =

+

=

hidden factor ΛΛ

ΨΨ observation zz

xx

ΛΛ_j_j, µ, µ_j_j ΨΨ

ωω ππ

) , (

) ,

| (

Ψ z Λ z

x

j j

j

N p

+

= µ

ω

mixture model mixture model

object localization

Factor faces

Factor faces Factor faces Factor faces for 45°°

for frontal view

(57)

Fisher Linear Discriminant

[Yang et al. 00]

Fisherface (FLD) Fisherface (FLD)

demonstrated good results demonstrated good results

in face recognition in face recognition

Apply Self Apply Self - - Organizing Map Organizing Map (SOM) to cluster faces/non (SOM) to cluster faces/non - -

faces, and thereby labels for faces, and thereby labels for

samples samples

Apply FLD to find optimal Apply FLD to find optimal projection matrix for

projection matrix for maximal separation maximal separation

Estimate class Estimate class - - conditional conditional density for detection

Given a set of unlabeled face Given a set of unlabeled face and non

and non--face samplesface samples SOMSOM

Face/non

Face/non--face prototypes generated by SOM face prototypes generated by SOM FLDFLD

Class Conditional Density Class Conditional Density Maximum Likelihood Estimation Maximum Likelihood Estimation

density for detection

(58)

Adaboost

[Freund and Schapire 95]

Use a set of weak classifiers (Use a set of weak classifiers (ε_t< 0.5) and weighting on ) and weighting on difficult examples for learning (sampling is based on the difficult examples for learning (sampling is based on the weights)

weights)

Given: (xGiven: (x₁₁, y, y₁₁), …, (x), …, (x_m_m, y, y_m_m) where x) where x_i_i∈X∈X, , yy_i_i∈Y∈Y={-={-1,+1}1,+1}

Initialize

Initialize DD₁₁(i(i)=1/)=1/mm

For tFor t = 1, …, = 1, …, TT::

Train a weak classifier using distribution DTrain a weak classifier using distribution D_t_t

1. Get a weak hypothesis h_t: X Æ{-1,+1} with error ε_t=Pr_i~Dt[h_t(x_i)≠y_i]

2. Importance of h_t:α_t=1/2 ln((1- ε_t)/ ε_t))

3.3. Update: Update: DD_t+1_t+1((ii)= )= DD_t_t((i)/i)/ZZ_t_t×e×e^-αt^- if h_t (x)=y_i (correctly classified) DD_t+1_t+1((i)= i)= DD_t_t((i)/i)/ZZ_t_t××ee^αtif h_t (x)≠y_i(incorrectly classified) where

where ZZ_t_t is a normalization factoris a normalization factor

Aggregating the classifiers: Aggregating the classifiers: H(x)=sign(H(x)=sign(ΣΣ_t=1_t=1α_th_t(x))

Perform well and does not overfit in empirical studiesPerform well and does not overfit in empirical studies

(59)

Adaboost-Based Detector

[Viola and Jones 01]

Main idea: Main idea:

Feature selection: select important features Feature selection: select important features

Focus of attention: focus on potential regions Focus of attention: focus on potential regions

Use an integral graph for fast feature evaluation Use an integral graph for fast feature evaluation

Use Adaboost to learn Use Adaboost to learn

A set of important features (feature selection) A set of important features (feature selection)

sort them in the order of importancesort them in the order of importance

each feature can be used as a simple (weak) classifiereach feature can be used as a simple (weak) classifier

A cascade of classifiers that A cascade of classifiers that

combine all the weak classifiers to do a difficult taskcombine all the weak classifiers to do a difficult task

filter out the regions that most likely do not contain facesfilter out the regions that most likely do not contain faces

(60)

Feature Selection

[Viola and Jones 01]

Training: If x Training: If x is a face, then is a face, then xx

most likely has feature 1 (easiest feature, most likely has feature 1 (easiest feature, and of greatest importance)

and of greatest importance)

very likely to have feature 2 (easy very likely to have feature 2 (easy feature)

feature)

……

likely to have feature n (more complex likely to have feature n (more complex feature, and of less importance since it feature, and of less importance since it does not exist in all the faces in the does not exist in all the faces in the training set)

training set)

Testing: Given a test subTesting: Given a test sub--image x’image x’

if x’if x’ has feature 1:has feature 1:

Test whether x’Test whether x’ has feature 2has feature 2

•• Test whether x’Test whether x’ has feature nhas feature n –– ……

•• else …else …

else, it is not faceelse, it is not face

else, it is not a faceelse, it is not a face

Similar to decision tree

x’x’

YesYes NoNo

x’

x’ is a faceis a face feature 2 feature 2

feature feature nn

x’x’is a non-is a non-faceface

x’x’is a non-is a non-faceface feature 1

feature 1

x’x’is a nonis a non--faceface

Similar to decision tree One simple implementationOne simple implementation

Recent Advances in Face Detection