Recent Advances in Face Detection

93  Download (0)

Full text

(1)

Recent Advances in Face Detection

Ming Ming - - Hsuan Yang Hsuan Yang

myang@honda-ri.com

http://www.honda-ri.com http://vision.ai.uiuc.edu/mhyang

Honda Research Institute Honda Research Institute

Mountain View, California, USA

Mountain View, California, USA

(2)

Face Detection: A Solved Problem?

„„

Recent results have Recent results have

demonstrated excellent demonstrated excellent

results results

‹‹

fast, multi pose, fast, multi pose,

partial occlusion, … partial occlusion, …

„„

So, is face detection a So, is face detection a solved problem?

solved problem?

„„

No, not quite… No, not quite…

Omron’s face detector Omron’s face detector

[Liu et al. 04]

[Liu et al. 04]

(3)

Outline

„„

Objective Objective

‹‹

Survey major face detection works Survey major face detection works

‹‹

Address “how” and “why” questions Address “how” and “why” questions

‹‹

Pros and cons of detection methods Pros and cons of detection methods

‹‹

Future research directions Future research directions

„„

Updated tutorial material Updated tutorial material

http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html

(4)

Face Detection

„„

Identify and locate Identify and locate

human faces in an image human faces in an image

regardless of their regardless of their

‹‹ positionposition

‹‹ scalescale

‹

‹ inin--plane rotationplane rotation

‹‹ orientationorientation

‹‹ pose (outpose (out--ofof--plane plane rotation)

rotation)

‹‹ and illuminationand illumination Where are the faces, if any?

(5)

Why Face Detection is Important?

„„

First step for any fully automatic face First step for any fully automatic face recognition system

recognition system

„„

First step in many surveillance systems First step in many surveillance systems

„„

Face is a highly non Face is a highly non - - rigid object rigid object

„„

Lots of applications Lots of applications

„„

A step towards Automatic Target Recognition A step towards Automatic Target Recognition

(ATR) or generic object detection/recognition

(ATR) or generic object detection/recognition

(6)

In One Thumbnail Face Image

„„

Consider a thumbnail 19 Consider a thumbnail 19 × × 19 face pattern 19 face pattern

„„

256 256

361361

possible combination of gray values possible combination of gray values

‹‹

256 256

361361

= 2 = 2

8×361361

= 2 = 2

28882888

„„

Total world population (as of 2004) Total world population (as of 2004)

‹‹

6,400,000,000 6,400,000,000 ≅ ≅ 2 2

3232

„„

87 times more than the world population! 87 times more than the world population!

„„

Extremely high dimensional space! Extremely high dimensional space!

(7)

Why Face Detection Is Difficult?

„„ Pose (OutPose (Out--ofof--Plane Rotation)Plane Rotation)::

frontal, 45 degree, profile, upside down frontal, 45 degree, profile, upside down

„„ Presence or absence of structural Presence or absence of structural components

components: beards, mustaches, and : beards, mustaches, and glasses

glasses

„„ Facial expressionFacial expression: face appearance is : face appearance is directly affected by a person's facial

directly affected by a person's facial expression

expression

„„ OcclusionOcclusion: faces may be partially occluded : faces may be partially occluded by other objects

by other objects

„„ Orientation (InOrientation (In--Plane Rotation)Plane Rotation): : face appearance directly vary for different face appearance directly vary for different

rotations about the camera's optical axis rotations about the camera's optical axis

„„ Imaging conditionsImaging conditions: lighting (spectra, : lighting (spectra, source distribution and intensity) and camera source distribution and intensity) and camera characteristics (sensor response, gain control, characteristics (sensor response, gain control,

lenses), resolution lenses), resolution

(8)

Related Problems

„„

Face localization Face localization

::

‹‹ Aim to determine the image position of a single faceAim to determine the image position of a single face

‹‹ A simplified detection problem with the assumption that an A simplified detection problem with the assumption that an input image contains only one face

input image contains only one face

„„

Facial feature extraction Facial feature extraction

::

‹

‹ To detect the presence and location of features such as eyes, To detect the presence and location of features such as eyes, nose, nostrils, eyebrow, mouth, lips, ears, etc

nose, nostrils, eyebrow, mouth, lips, ears, etc

‹‹ Usually assume that there is only one face in an imageUsually assume that there is only one face in an image

„„

Face recognition (identification) Face recognition (identification)

„„

Facial expression recognition Facial expression recognition

„„

Human pose estimation and tracking Human pose estimation and tracking

(9)

Face Detection and Object Recognition

„„

Detection: concerns with a Detection: concerns with a category category of object of object

„„

Recognition: concerns with Recognition: concerns with individual individual identity identity

„„

Face is a highly non Face is a highly non - - rigid object rigid object

„„

Many methods can be applied to other object Many methods can be applied to other object detection/recognition

detection/recognition

Car detection

Car detection Pedestrian detectionPedestrian detection

(10)

Human Detection and Tracking

„„

Often used as a salient Often used as a salient cue for human detection cue for human detection

„„

Used as a strong cue to Used as a strong cue to search for other body search for other body

parts parts

„„

Used to detect new Used to detect new objects and re

objects and re - - initialize a initialize a tracker once it fails

tracker once it fails

[Lee and Cohen 04]

[Lee and Cohen 04] [Okuma et al. 04][Okuma et al. 04]

(11)

Research Issues

„„

Representation: How to describe a typical face? Representation: How to describe a typical face?

„„

Scale: How to deal with face of different size? Scale: How to deal with face of different size?

„„

Search strategy: How to spot these faces? Search strategy: How to spot these faces?

„„

Speed: How to speed up the process? Speed: How to speed up the process?

„„

Precision: How to locate the faces precisely? Precision: How to locate the faces precisely?

„„

Post processing: How to combine detection Post processing: How to combine detection results?

results?

(12)

Face Detector: Ingredients

„„

Target application domain: single image, video Target application domain: single image, video

„„

Representation: holistic, feature, holistic, etc Representation: holistic, feature, holistic, etc

„„

Pre processing: histogram equalization, etc Pre processing: histogram equalization, etc

„„

Cues: color, motion, depth, voice, etc Cues: color, motion, depth, voice, etc

„„

Search strategy: exhaustive, greedy, focus of Search strategy: exhaustive, greedy, focus of attention, etc

attention, etc

„„

Classifier design: ensemble, cascade Classifier design: ensemble, cascade

„„

Post processing: combing detection results Post processing: combing detection results

(13)

In This Tutorial

Face Detection

Video Single Image

Color Gray Scale

Upright frontal

Color Gray Scale

Pose

Rotation

Occlusion Motion

Depth Voice

„

Focus on detecting

‹ upright, frontal faces

‹ in a single gray-scale image

‹ with decent resolution

‹ under good lighting conditions

„

See

[Sinha 01]

for

detecting faces in

low-resolution

images

(14)

Methods to Detect/Locate Faces

„„ KnowledgeKnowledge--based methodsbased methods::

‹‹ Encode human knowledge of what constitutes a typical face Encode human knowledge of what constitutes a typical face (usually, the relationships between facial features)

(usually, the relationships between facial features)

„„ Feature invariant approachesFeature invariant approaches::

‹‹ Aim to find structural features of a face that exist even when the Aim to find structural features of a face that exist even when the pose, viewpoint, or lighting conditions vary

pose, viewpoint, or lighting conditions vary

„„ Template matching methodsTemplate matching methods::

‹‹ Several standard patterns stored to describe the face as a wholeSeveral standard patterns stored to describe the face as a whole or or the facial features separately

the facial features separately

„„ AppearanceAppearance--based methodsbased methods::

‹‹ The models (or templates) are learned from a set of training images The models (or templates) are learned from a set of training images which capture the representative variability of facial appearanc

which capture the representative variability of facial appearancee Many methods can be categorized in several ways

Many methods can be categorized in several ways

(15)

Agenda

„„

Detecting faces in gray scale images Detecting faces in gray scale images

‹‹

Knowledge Knowledge - - based based

‹‹

Feature Feature - - based based

‹‹

Template Template - - based based

‹‹

Appearance Appearance - - based based

„„

Detecting faces in color images Detecting faces in color images

„„

Detecting faces in video Detecting faces in video

„„

Performance evaluation Performance evaluation

„„

Research direction and concluding remarks Research direction and concluding remarks

(16)

Knowledge-Based Methods

„„

Top Top - - down approach: Represent a face using a down approach: Represent a face using a set of human

set of human - - coded rules coded rules

„„

Example: Example:

‹‹

The center part of face has uniform intensity values The center part of face has uniform intensity values

‹‹

The difference between the average intensity values The difference between the average intensity values of the center part and the upper part is significant

of the center part and the upper part is significant

‹‹

A face often appears with two eyes that are A face often appears with two eyes that are symmetric to each other, a nose and a mouth symmetric to each other, a nose and a mouth

„„

Use these rules to guide the search process Use these rules to guide the search process

(17)

Knowledge-Based Method:

[Yang and Huang 94]

„„ MultiMulti--resolution focusresolution focus--ofof--attention attention approach

approach

„„ Level 1 (lowest resolution):Level 1 (lowest resolution):

apply the rule “the center part of apply the rule “the center part of

the face has 4 cells with a the face has 4 cells with a

basically uniform intensity” to basically uniform intensity” to

search for candidates search for candidates

„„ Level 2: local histogram Level 2: local histogram

equalization followed by edge equalization followed by edge

detection detection

„„ Level 3: search for eye and mouth Level 3: search for eye and mouth features for validation

features for validation

(18)

Knowledge-Based Method :

[Kotropoulos & Pitas 94]

„„

Horizontal/vertical projection to search for candidates Horizontal/vertical projection to search for candidates

„„

Search eyebrow/eyes, nostrils/nose for validation Search eyebrow/eyes, nostrils/nose for validation

„„

Difficult to detect multiple people or in complex Difficult to detect multiple people or in complex background

= =

=

= m

x n

y

y x I y

VI y

x I x

HI

1 1

) , ( )

( )

, ( )

(

background

[Kotropoulos & Pitas 94]

(19)

Knowledge-based Methods: Summary

„„ Pros:Pros:

‹‹ Easy to come up with simple rules to describe the features Easy to come up with simple rules to describe the features of a face and their relationships

of a face and their relationships

‹

‹ Based on the coded rules, facial features in an input image Based on the coded rules, facial features in an input image are extracted first, and face candidates are identified

are extracted first, and face candidates are identified

‹‹ Work well for face localization in uncluttered backgroundWork well for face localization in uncluttered background

„„ Cons:Cons:

‹‹ Difficult to translate human knowledge into rules precisely: Difficult to translate human knowledge into rules precisely:

detailed rules fail to detect faces and general rules may find detailed rules fail to detect faces and general rules may find many false positives

many false positives

‹‹ Difficult to extend this approach to detect faces in different Difficult to extend this approach to detect faces in different poses: implausible to enumerate all the possible cases

poses: implausible to enumerate all the possible cases

(20)

Agenda

„„

Detecting faces in gray scale images Detecting faces in gray scale images

‹‹

Knowledge Knowledge - - based based

‹‹

Feature Feature - - based based

‹‹

Template Template - - based based

‹‹

Appearance Appearance - - based based

„„

Detecting faces in color images Detecting faces in color images

„„

Detecting faces in video Detecting faces in video

„„

Performance evaluation Performance evaluation

„„

Research direction and concluding remarks Research direction and concluding remarks

(21)

Feature-Based Methods

„„

Bottom Bottom - - up approach: Detect facial features up approach: Detect facial features (eyes, nose, mouth, etc) first

(eyes, nose, mouth, etc) first

„„

Facial features: edge, intensity, shape, texture, Facial features: edge, intensity, shape, texture, color, etc

color, etc

„„

Aim to detect invariant features Aim to detect invariant features

„„

Group features into candidates and verify them Group features into candidates and verify them

(22)

Random Graph Matching

[Leung et al. 95]

„„ Formulate as a problem to find the correct Formulate as a problem to find the correct geometric arrangement of facial features geometric arrangement of facial features

„„ Facial features are defined by the average Facial features are defined by the average responses of multi

responses of multi--orientation, multi-orientation, multi-scale scale Gaussian derivative filters

Gaussian derivative filters

„„ Learn the configuration of features with Learn the configuration of features with Gaussian distribution of mutual distance Gaussian distribution of mutual distance

between facial features between facial features

„„ Convolve an image with Gaussian filters to Convolve an image with Gaussian filters to locate candidate features based on similarity locate candidate features based on similarity

„„ Random graph matching among the Random graph matching among the candidates to locate faces

candidates to locate faces

(23)

Feature Grouping

[Yow and Cipolla 90]

„„ Apply a 2Apply a 2ndnd derivative derivative

Gaussian filter to search for Gaussian filter to search for

interest points interest points

„„ Group the edges near interest Group the edges near interest points into regions

points into regions

„„ Each feature and grouping is Each feature and grouping is evaluated within a Bayesian evaluated within a Bayesian

network network

„„ Handle a few posesHandle a few poses

„„ See also See also [Amit et al. 97][Amit et al. 97] for for

efficient hierarchical (focus of efficient hierarchical (focus of

attention) feature

attention) feature--based based method

Face model and component

Model facial feature as pair of edges

Apply interest point operator and edge detector to search for features

Using Bayesian network to combine evidence

method

(24)

Feature-Based Methods: Summary

„„

Pros: Pros:

‹‹

Features are invariant to pose and orientation Features are invariant to pose and orientation change

change

„„

Cons: Cons:

‹‹

Difficult to locate facial features due to Difficult to locate facial features due to several corruption (illumination, noise, several corruption (illumination, noise,

occlusion) occlusion)

‹‹

Difficult to detect features in complex Difficult to detect features in complex background

background

(25)

Agenda

„„

Detecting faces in gray scale images Detecting faces in gray scale images

‹‹

Knowledge Knowledge - - based based

‹‹

Feature Feature - - based based

‹‹

Template Template - - based based

‹‹

Appearance Appearance - - based based

„„

Detecting faces in color images Detecting faces in color images

„„

Detecting faces in video Detecting faces in video

„„

Performance evaluation Performance evaluation

„„

Research direction and concluding remarks Research direction and concluding remarks

(26)

Template Matching Methods

„„

Store a template Store a template

‹‹

Predefined: based on edges or regions Predefined: based on edges or regions

‹‹

Deformable: based on facial contours (e.g., Deformable: based on facial contours (e.g., Snakes)

Snakes)

„„

Templates are hand Templates are hand - - coded (not learned) coded (not learned)

„„

Use correlation to locate faces Use correlation to locate faces

(27)

Face Template

„„

Use relative pair Use relative pair -wise ratios - wise ratios of the brightness of facial of the brightness of facial

regions (14

regions (14 × × 16 pixels): the 16 pixels): the eyes are usually darker than eyes are usually darker than

the surrounding face

the surrounding face

[Sinha 94][Sinha 94]

„„

Use average area intensity Use average area intensity values than absolute pixel values than absolute pixel

values values

„„

See also Point Distribution See also Point Distribution Model (PDM)

Model (PDM)

[Lanitis et al. 95]

Ration Template [Sinha 94]

Ration Template [Sinha 94]

average shape average shape [Lanitis et al. 95]

[Lanitis et al. 95]

[Lanitis et al. 95]

(28)

Template-Based Methods: Summary

„„

Pros: Pros:

‹‹

Simple Simple

„„

Cons: Cons:

‹‹

Templates needs to be initialized near the Templates needs to be initialized near the face images

face images

‹‹

Difficult to enumerate templates for different Difficult to enumerate templates for different poses (similar to knowledge

poses (similar to knowledge - - based methods) based methods)

(29)

Agenda

„„

Detecting faces in gray scale images Detecting faces in gray scale images

‹‹

Knowledge Knowledge - - based based

‹‹

Feature Feature - - based based

‹‹

Template Template - - based based

‹‹

Appearance Appearance - - based based

„„

Detecting faces in color images Detecting faces in color images

„„

Detecting faces in video Detecting faces in video

„„

Performance evaluation Performance evaluation

„„

Research direction and concluding remarks Research direction and concluding remarks

(30)

Appearance-Based Methods

„„

Train a classifier using positive (and usually Train a classifier using positive (and usually negative) examples of faces

negative) examples of faces

„„

Representation Representation

„„

Pre processing Pre processing

„„

Train a classifier Train a classifier

„„

Search strategy Search strategy

„„

Post processing Post processing

„„

View View - - based based

(31)

Appearance-Based Methods: Classifiers

„„ Neural network: Multilayer Perceptrons Neural network: Multilayer Perceptrons

„„ Princiapl Component Analysis (PCA), Factor Analysis Princiapl Component Analysis (PCA), Factor Analysis

„„ Support vector machine (SVM)Support vector machine (SVM)

„„ Mixture of PCA, Mixture of factor analyzersMixture of PCA, Mixture of factor analyzers

„„ DistributionDistribution--based methodbased method

„„ Naïve Bayes classifierNaïve Bayes classifier

„„ Hidden Markov modelHidden Markov model

„„ Sparse network of winnows (SNoW)Sparse network of winnows (SNoW)

„„ Kullback relative informationKullback relative information

„„ Inductive learning: C4.5Inductive learning: C4.5

„„ Adaboost Adaboost

„„ ……

(32)

Representation

„„

Holistic: Each image is raster scanned and Holistic: Each image is raster scanned and represented by a vector of intensity values represented by a vector of intensity values

„„

Block Block - - based: Decompose each face image into based: Decompose each face image into a set of overlapping or non

a set of overlapping or non - - overlapping blocks overlapping blocks

‹‹

At multiple scale At multiple scale

‹‹

Further processed with vector quantization, Further processed with vector quantization, Principal Component Analysis, etc.

Principal Component Analysis, etc.

(33)

Face and Non-Face Exemplars

„„

Positive examples: Positive examples:

‹‹ Get as much variation as possibleGet as much variation as possible

‹‹ Manually crop and normalize each Manually crop and normalize each face image into a standard size

face image into a standard size (e.g., 19

(e.g., 19 ×× 19 pixels)19 pixels)

‹

‹ Creating virtual examples Creating virtual examples [Sung and [Sung and Poggio 94]

Poggio 94]

„„

Negative examples: Negative examples:

‹‹ Fuzzy ideaFuzzy idea

‹‹ Any images that do not contain Any images that do not contain faces

faces

‹‹ A large image subspaceA large image subspace

‹‹ Bootstraping Bootstraping [Sung and Poggio 94][Sung and Poggio 94]

(34)

Distribution-Based Method

[Sung & Poggio 94]

„„

Masking: Masking:

reduce the unwanted reduce the unwanted background noise in a face

background noise in a face pattern

pattern

„„

Illumination gradient Illumination gradient correction:

correction:

find the best fit find the best fit brightness plane and then brightness plane and then

subtracted from it to reduce subtracted from it to reduce

heavy shadows caused by heavy shadows caused by

extreme lighting angles extreme lighting angles

„„

Histogram equalization: Histogram equalization:

compensates the imaging effects compensates the imaging effects

due to changes in illumination due to changes in illumination

and different camera input gains and different camera input gains

(35)

Creating Virtual Positive Examples

„„

Simple and very Simple and very effective method effective method

„„

Randomly mirror, Randomly mirror, rotate, translate and rotate, translate and

scale face samples by scale face samples by

small amounts small amounts

„„

Increase number of Increase number of training examples training examples

„„

Less sensitive to Less sensitive to alignment error

Randomly mirrored, rotated Randomly mirrored, rotated translated, and scaled faces translated, and scaled faces

[Sung & Poggio 94]

alignment error

(36)

Distribution of Face/Non-face Pattern

„„ Cluster face and non-Cluster face and non-face face samples into a few (i.e., 6) samples into a few (i.e., 6)

clusters using K

clusters using K--means means algorithm

algorithm

„„ Each cluster is modeled by a Each cluster is modeled by a multi

multi--dimensional Gaussian with dimensional Gaussian with a centroid and covariance matrix a centroid and covariance matrix

„„ Approximate each Gaussian Approximate each Gaussian

covariance with a subspace (i.e., covariance with a subspace (i.e.,

using the largest eigenvectors) using the largest eigenvectors)

„„ See See [Moghaddam and Pentland 97][Moghaddam and Pentland 97]

on distribution

on distribution--based learning based learning using Gaussian mixture model

[Sung & Poggio 94]

= ( ) ( )

2 exp 1 )

2 ( ) 1

( 1/2 T 1

2

/ x µ Σ x µ

x Σ p d

π

xx: face, non-: face, non-face samplesface samples

using Gaussian mixture model

(37)

Distance Metrics

[Sung & Poggio 94]

„„ Compute distances of a sample to Compute distances of a sample to all the face and non

all the face and non-face clusters-face clusters

„„ Each distance measure has two Each distance measure has two parts:

parts:

‹‹ Within subspace distance (DWithin subspace distance (D11): ):

Mahalanobis distance of the projected Mahalanobis distance of the projected sample to cluster center

sample to cluster center

‹

‹ Distance to the subspace (Distance to the subspace (DD22): ):

distance of the sample to the subspace distance of the sample to the subspace

„„ Feature vector: Each face/non-Feature vector: Each face/non-face face samples is represented by a vector samples is represented by a vector

of these distance measurements of these distance measurements

„„ Train a multilayer perceptron Train a multilayer perceptron

using the feature vectors for face using the feature vectors for face

detection detection

„„ 6 face clusters6 face clusters

„„ 6 non-6 non-face clustersface clusters

„„ 2 distance values per cluster2 distance values per cluster

„„ 24 measurements24 measurements

T 2 75 75 2

2

1 T 1

) )(

( ) (

)) (

) (

|

| ln 2 ln 2( 1

µ x x

x

µ x Σ µ x Σ

=

=

+ +

=

E E I D

d D

p

π

[Sung and Poggio 94]

(38)

Bootstrapping

[Sung and Poggio 94]

1.1. Start with a small set of nonStart with a small set of non--face face examples in the training set

examples in the training set

2.2. Train a MLP classifier with the Train a MLP classifier with the current training set

current training set

3.3. Run the learned face detector on a Run the learned face detector on a sequence of random images.

sequence of random images.

4.4. Collect all the non-Collect all the non-face patterns face patterns that the current system wrongly that the current system wrongly classifies as faces (i.e., false classifies as faces (i.e., false positives)

positives)

5.5. Add these non-Add these non-face patterns to the face patterns to the training set

training set

6.6. Got to Step 2 or stop if satisfiedGot to Step 2 or stop if satisfied ÎÎ Improve the system performance Improve the system performance

greatly

Test image Test image

greatly False positive detectsFalse positive detects

(39)

Search over Space and Scale

Scan an input image at one

Scan an input image at one--pixel incrementspixel increments horizontally and vertically

Downsample the input image by Downsample the input image by a factor of 1.2 and continue to search horizontally and vertically a factor of 1.2 and continue to search

(40)

Continue to Search over Space and Scale

Continue to downsample the input image and search

until the image size is too small

(41)

Experimental Results:

[Sung and Poggio 94]

„„

Can be have multiple Can be have multiple detects of a face since detects of a face since

it may be detected it may be detected

‹‹

at different scale at different scale

‹‹

at a slightly at a slightly

displaced window displaced window

location location

„„

Able to detect upright Able to detect upright frontal faces

frontal faces

(42)

Neural Network-Based Detector

„„ Train multiple multilayer perceptrons with different receptive Train multiple multilayer perceptrons with different receptive fields

fields [Rowley and Kanade 96][Rowley and Kanade 96]..

„„ Merging the overlapping detections within one networkMerging the overlapping detections within one network

„„ Train an arbitration network to combine the results from differeTrain an arbitration network to combine the results from different nt networks

networks

„„ Needs to find the right neural network architecture (number of Needs to find the right neural network architecture (number of layers, hidden units, etc.) and parameters (learning rate, etc.) layers, hidden units, etc.) and parameters (learning rate, etc.)

(43)

Dealing with Multiple Detects

„„

Merging overlapping Merging overlapping detections within one detections within one

network

network

[Rowley and Kanade [Rowley and Kanade 96]96]

„„

Arbitration among Arbitration among multiple networks multiple networks

‹‹

AND operator AND operator

‹‹

OR operator OR operator

‹‹

Voting Voting

‹‹

Arbitration network

Merging overlapping results Merging overlapping results

Arbitration network

ANDing results from two networks ANDing results from two networks

(44)

Experimental Results:

[Rowley et al. 96]

(45)

Detecting Rotated Faces

[Rowley et al. 98]

„„

A router network is trained to estimate the angle of an A router network is trained to estimate the angle of an input window

input window

‹‹ If it contain a face, the router returns the angle of the face If it contain a face, the router returns the angle of the face and the face can be rotated back to upright frontal position.

and the face can be rotated back to upright frontal position.

‹

‹ Otherwise the router returns a meaningless angleOtherwise the router returns a meaningless angle

„„

The de The de - - rotated window is then applied to a detector rotated window is then applied to a detector (previously trained for upright frontal faces)

(previously trained for upright frontal faces)

(46)

Router Network

[Rowley et al. 98]

„„

Rotate a face sample at 10 degree increment Rotate a face sample at 10 degree increment

„„

Create virtual examples (translation and scaling) from Create virtual examples (translation and scaling) from each sample

each sample

„„

Train a multilayer neural network with input Train a multilayer neural network with input - - output output pair pair

Input

Input--output pair to train a router networkoutput pair to train a router network

(47)

Experimental Results

[Rowley et al. 98]

„„ Able to detect rotated faces Able to detect rotated faces with good results

with good results

„„ Performance degrades in Performance degrades in detecting upright frontal detecting upright frontal

faces due to the use of faces due to the use of

router network router network

„„ See also [Feraud et al. 01]See also [Feraud et al. 01]

(48)

Support Vector Machine (SVM)

„„

Find the optimal separating Find the optimal separating hyperplane constructed by hyperplane constructed by

support vectors

support vectors

[Vapnik 95][Vapnik 95]

„„

Maximize distances between Maximize distances between the data points closest to the the data points closest to the separating hyperplane (large separating hyperplane (large

margin classifier) margin classifier)

„„

Formulated as a quadratic Formulated as a quadratic programming problem

programming problem

„„

Kernel functions for Kernel functions for nonlinear SVMs

support vector d

margin

nonlinear SVMs

(49)

SVM-Based Face Detector

[Osuna et al. 97]

„„

Adopt similar architecture Adopt similar architecture Similar to

Similar to

[Sung and Poggio 94][Sung and Poggio 94]

with the SVM classifier with the SVM classifier

„„

Pros: Good recognition rate Pros: Good recognition rate with theoretical support

with theoretical support

„„

Cons: Cons:

‹‹

Time consuming in Time consuming in training and testing training and testing

‹‹

Need to pick the right Need to pick the right kernel

[Osuna et al. 97]

kernel

(50)

SVM-Based Face Detector: Issues

„„

Training: Solve a complex quadratic optimization Training: Solve a complex quadratic optimization problem

problem

‹‹ SpeedSpeed--up: Sequential Minimal Optimization (SMO) up: Sequential Minimal Optimization (SMO) [Platt 99][Platt 99]

„„

Testing: The number of support vectors may be large Testing: The number of support vectors may be large Æ Æ lots of kernel computations lots of kernel computations

‹

‹ SpeedSpeed--up: Reduced set of support vectors up: Reduced set of support vectors [Romdhani et al. 01][Romdhani et al. 01]

„„

Variants: Variants:

‹

‹ ComponentComponent--based SVM based SVM [Heisele et al. 01]:[Heisele et al. 01]:

 Learn components and their geometric configurationLearn components and their geometric configuration

 Less sensitive to pose variationLess sensitive to pose variation

(51)

Sparse Network of Winnows

[Roth 98]

„„

On line, mistake driven algorithm On line, mistake driven algorithm

„„

Attribute (feature) efficiency Attribute (feature) efficiency

„„

Allocations of nodes and links is data driven Allocations of nodes and links is data driven

‹‹

complexity depends on number of active features complexity depends on number of active features

„„

Allows for combining task hierarchically Allows for combining task hierarchically

„„

Multiplicative learning rule Multiplicative learning rule

Target nodes

Features

(52)

SNoW-Based Face Detector

„„

Multiplicative weight update algorithm: Multiplicative weight update algorithm:

„„

Pros: On Pros: On - - line feature selection line feature selection

[Yang et al. 00][Yang et al. 00]

„„

Cons: Need more powerful feature Cons: Need more powerful feature representation scheme

representation scheme

„„

Also been applied to object recognition Also been applied to object recognition

[Yang et al. [Yang et al.

02]02]

0.5 2,

Usually,

(demotion) 1)

x (if w w

, x

but w

0 Class

If

) (promotion

1) x

(if w w

, x

w but

1 Class

If

x w

iff 1 is Prediction

i i

i

i i

i

=

=

=

=

=

=

β α

β α θ

θ

θ

(53)

Probabilistic Modeling of Local Appearance

[Schneiderman and Kanade 98]

„„ Using local appearanceUsing local appearance

„„ Learn the distribution by Learn the distribution by parts using Naïve Bayes parts using Naïve Bayes

classifier classifier

„„ Apply Bayesian decision Apply Bayesian decision rulerule

„„ Further decompose the Further decompose the appearance into space, appearance into space,

frequency, and orientation frequency, and orientation

„„ Learn the joint distribution Learn the joint distribution of object and position

of object and position

„„ Also wavelet representationAlso wavelet representation

=

= n

k

k object subregion

p object

region p

1

)

| (

)

| (

pp( |face)=( |face)=

p( |face)*

p( |face)* p( |face)*p( |face)*

p( |face)*

p( |face)* p( |face)p( |face)

p( , x, y, s |face)*…

p( , x, y, s |face)*…

oror

p( , x, y, s |face)*…

p( , x, y, s |face)*…

) (

) (

)

| (

)

| (

object p

ojbect p

object region

P

object region

p > λ =

(54)

Detecting faces in Different Pose

„„

Extend to detect faces in Extend to detect faces in different pose with

different pose with multiple detectors multiple detectors

„„

Each detector specializes Each detector specializes to a view: frontal, left

to a view: frontal, left pose and right pose pose and right pose

„„ [Mikolajczyk et al. 01][Mikolajczyk et al. 01]

extend extend to detect faces from side to detect faces from side

pose to frontal view pose to frontal view

[Schneiderman and Kanade 98]

(55)

Experimental Results

[Schneiderman and Kanade 98]

Able to detect profile faces [Schneiderman and Kanade 98]

Extended to detect cars [Schneiderman and Kanade 00]

(56)

Mixture of Factor Analyzers

[Yang et al. 00]

„„ Generative method that performs Generative method that performs

clustering and dimensionality reduction clustering and dimensionality reduction

within each cluster within each cluster

„„ Similar to probabilistic PCA but has Similar to probabilistic PCA but has more merits

more merits

‹‹ proper density modelproper density model

‹

‹ robust to noiserobust to noise

„„ Use mixture model to detect faces in Use mixture model to detect faces in different pose

different pose

„„ Using EM to estimate all the parameters Using EM to estimate all the parameters in the mixture model

in the mixture model

„„ See also See also [Moghaddam and Pentland 97][Moghaddam and Pentland 97] on on using probabilistic Gaussian mixture for using probabilistic Gaussian mixture for

object localization

zz

xx

) , (

)

|

(x z Λz Ψ

u Λz

x

N

p =

+

=

hidden factor ΛΛ

ΨΨ observation zz

xx

ΛΛjj, µ, µjj ΨΨ

ωω ππ

) , (

) ,

| (

Ψ z Λ z

x

j j

j

N p

+

= µ

ω

mixture model mixture model

object localization

Factor faces

Factor faces Factor faces Factor faces for 45°°

for frontal view

(57)

Fisher Linear Discriminant

[Yang et al. 00]

„„

Fisherface (FLD) Fisherface (FLD)

demonstrated good results demonstrated good results

in face recognition in face recognition

„„

Apply Self Apply Self - - Organizing Map Organizing Map (SOM) to cluster faces/non (SOM) to cluster faces/non - -

faces, and thereby labels for faces, and thereby labels for

samples samples

„„

Apply FLD to find optimal Apply FLD to find optimal projection matrix for

projection matrix for maximal separation maximal separation

„„

Estimate class Estimate class - - conditional conditional density for detection

Given a set of unlabeled face Given a set of unlabeled face and non

and non--face samplesface samples SOMSOM

Face/non

Face/non--face prototypes generated by SOM face prototypes generated by SOM FLDFLD

Class Conditional Density Class Conditional Density Maximum Likelihood Estimation Maximum Likelihood Estimation

density for detection

(58)

Adaboost

[Freund and Schapire 95]

„„ Use a set of weak classifiers (Use a set of weak classifiers (εt < 0.5) and weighting on ) and weighting on difficult examples for learning (sampling is based on the difficult examples for learning (sampling is based on the weights)

weights)

„„ Given: (xGiven: (x11, y, y11), …, (x), …, (xmm, y, ymm) where x) where xii∈X∈X, , yyii∈Y∈Y={-={-1,+1}1,+1}

Initialize

Initialize DD11(i(i)=1/)=1/mm

For tFor t = 1, …, = 1, …, TT::

„„ Train a weak classifier using distribution DTrain a weak classifier using distribution Dtt

1. Get a weak hypothesis ht: X Æ{-1,+1} with error εt=Pri~Dt[ht(xi)≠yi]

2. Importance of htt=1/2 ln((1- εt)/ εt))

3.3. Update: Update: DDt+1t+1((ii)= )= DDtt((i)/i)/ZZtt×e×e-αt - if ht (x)=yi (correctly classified) DDt+1t+1((i)= i)= DDtt((i)/i)/ZZtt××eeαt if ht (x)≠yi (incorrectly classified) where

where ZZtt is a normalization factoris a normalization factor

„

„ Aggregating the classifiers: Aggregating the classifiers: H(x)=sign(H(x)=sign(ΣΣt=1 t=1 αt ht(x))

„„ Perform well and does not overfit in empirical studiesPerform well and does not overfit in empirical studies

(59)

Adaboost-Based Detector

[Viola and Jones 01]

„„

Main idea: Main idea:

‹‹

Feature selection: select important features Feature selection: select important features

‹‹

Focus of attention: focus on potential regions Focus of attention: focus on potential regions

‹‹

Use an integral graph for fast feature evaluation Use an integral graph for fast feature evaluation

„„

Use Adaboost to learn Use Adaboost to learn

‹‹

A set of important features (feature selection) A set of important features (feature selection)



 sort them in the order of importancesort them in the order of importance

 each feature can be used as a simple (weak) classifiereach feature can be used as a simple (weak) classifier

‹‹

A cascade of classifiers that A cascade of classifiers that

 combine all the weak classifiers to do a difficult taskcombine all the weak classifiers to do a difficult task



 filter out the regions that most likely do not contain facesfilter out the regions that most likely do not contain faces

(60)

Feature Selection

[Viola and Jones 01]

„„ Training: If x Training: If x is a face, then is a face, then xx

‹‹ most likely has feature 1 (easiest feature, most likely has feature 1 (easiest feature, and of greatest importance)

and of greatest importance)

‹‹ very likely to have feature 2 (easy very likely to have feature 2 (easy feature)

feature)

‹‹

‹‹ likely to have feature n (more complex likely to have feature n (more complex feature, and of less importance since it feature, and of less importance since it does not exist in all the faces in the does not exist in all the faces in the training set)

training set)

„„ Testing: Given a test subTesting: Given a test sub--image x’image x’

‹‹ if x’if x’ has feature 1:has feature 1:

 Test whether x’Test whether x’ has feature 2has feature 2

Test whether x’Test whether x’ has feature nhas feature n

else …else …



 else, it is not faceelse, it is not face

‹

‹ else, it is not a faceelse, it is not a face

„„ Similar to decision tree

x’x’

YesYes NoNo

YesYes NoNo

YesYes NoNo

x’

x’ is a faceis a face feature 2 feature 2

feature feature nn

x’x’is a non-is a non-faceface

x’x’is a non-is a non-faceface feature 1

feature 1

x’x’is a nonis a non--faceface

Similar to decision tree One simple implementationOne simple implementation

Figure

Updating...

References

Related subjects :