Making faces
Digital Visual Effects, Spring 2005 Yung-Yu Chuang
2005/6/8
with slides by Richard Szeliski, Steve Seitz and Alex Efros
Outline
• 3D acquisition for faces
• Statistical methods
• Face models from single images
• Image-based faces
• Relighting for faces
Cyberware scanners
face & head scanner whole body scanner
and view-dependent texture mapping
• Procedure
1. Take multiple photographs of a person 2. Establish corresponding feature points 3. Recover 3D points and camera
parameters
4. Deform generic face model to fit points 5. Extract textures from photos
Reconstruct a 3D model
input photographs
generic 3D face model
pose estimation
more features
deformed model
– Compute displacement of feature points – Apply scattered data interpolation
generic model displacement deformed model
Mesh deformation
combination of the colors in the photos
• Texture can be:
– View-independent – View-dependent
• Considerations for weighting
– Occlusion – Smoothness
– Positional certainty – View similarity
Texture extraction
view-independent view-dependent
Texture extraction
Use images to adapt a generic face model.
Creating new expressions
• In addition to global blending we can use:
– Regional blending – Painterly interface
Applying a global blend
+ =
/2 /2
New expressions are created with 3D morphing:
Applying a region-based blend
x
+
x=
Creating new expressions
Using a painterly interface
+ + +
=
Drunken smile
Morphing over time creates animation:
“neutral” “joy”
Video
video projectors color cameras
black & white cameras
Spacetime faces
time
Face surface
Face surface
stereo
time
stereo active stereo
spacetime stereo
stereo active stereo
time
time=1
Face surface
surface motion
time
time=2
Spacetime Stereo
Face surface
surface motion
time=3 Face surfa
surface motion
time
time=4
Spacetime Stereo
Face surface
surface motion
time=5 surface motion
time
time
Spacetime Stereo
surface motion
Better
• spatial resolution
• temporal stableness
Video
FaceIK
3D face applications: The one
extra 3M
3D face applications: Spiderman 2
Statistical methods
f(z)+ε
z y
para- meters
observed signal )
| ( max
* P z y
z =
z) (
) ( )
| max (
y P
z P z
y P
=
z) ( )
| (
min L y z L z
z
+
=
Example:
super-resolution de-noising
de-blocking Inpainting
…
) ( )
| (
min
* L y z L z
z =
z+
2
)
2(
σ
z f
data y −
evidence
a-priori knowledge
f(z)+ε
z y
para- meters
observed
signal
Statistical methods
There are approximately 10240 possible 10×10 gray-level images. Even human being has not seen them all yet. There must be a strong
statistical bias.
Takeo Kanade
Approximately 8X1011 blocks per day per person.
“Smooth images are good images.”
∑
=
x
x V
z
L ( ) ρ ( ( ))
)
2( d = d Gaussian MRF ρ
T d
T d
T d
T T
d d
>
≤
⎩ ⎨
⎧
−
= +
) (
) 2
(
2ρ
2Huber MRF
Generic priors
“Existing images are good images.”
six 200×200
Images ⇒
2,000,000
pairs
Example-based priors
L(z)
low-resolution high-resolution
Model-based priors
“Face images are good images when working on face images …”
Parametric
model Z=WX+μ L(X)
) ( )
| (
min
* L y z L z
z =
z+
⎩ ⎨
⎧
+
=
+ +
=
μ
μ
*
*
) (
)
| (
min
*
WX z
X L WX
y L
X
x⇒
approximating a high-dimensional data set with a lower-dimensional subspace
Original axes Original axes
**
**
**
**
**
**
** **
**
**
***
*
**
**
**
**
**
****** **
**
**
**
Data points Data points
First principal component First principal component Second principal component
Second principal component
PCA
• Given n k-d points
• Calculate the mean
• Calculate the covariance matrix
• SVD (eigen-analysis) on the covariance matrix
T 1
0 0
0 0
0 0
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
⎟⎟
⎟⎟
⎟⎟
⎠
⎞
⎜⎜
⎜⎜
⎜⎜
⎝
⎛
=
⎟⎟
⎟⎟
⎟⎟
⎠
⎞
⎜⎜
⎜⎜
⎜⎜
⎝
⎛
V U
A
wn
w
O
T 1
0 0
0 0
0 0
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
⎟⎟
⎟⎟
⎟⎟
⎠
⎞
⎜⎜
⎜⎜
⎜⎜
⎝
⎛
=
⎟⎟
⎟⎟
⎟⎟
⎠
⎞
⎜⎜
⎜⎜
⎜⎜
⎝
⎛
U V A
wn
w
O
PCA on faces: “eigenfaces”
Average Average
faceface
First principal component First principal component
Other Other components components
For all except average, For all except average,
“gray“gray”” = 0,= 0,
“white“white”” > 0,> 0,
““blackblack”” < 0< 0
“Face images are good images when working on face images …”
Parametric
model Z=WX+μ L(X)
) ( )
| (
min
* L y z L z
z =
z+
⎩ ⎨
⎧
+
=
+ +
=
μ
μ
*
*
) (
)
| (
min
*
WX z
X L WX
y L
X
x⇒
Super-resolution
(a) (b) (c) (d) (e) (f)
(a) Input low 24×32 (b) Our results (c) Cubic B-Spline
(d) Freeman et al. (e) Baker et al. (f) Original high 96×128
Morphable model of 3D faces
• Start with a catalogue of 200 aligned 3D Cyberware scans
• Build a model of average shape and texture, and principal variations using PCA
Morphable model of 3D faces
Divide face into 4 regions (eyes, nose, mouth, head)
For each new prototype, find
amount of deviation from the reference shape and texture.
Reconstruction from single image
Animating from a single image
Reanimating faces
35 static scans at different expressions
exercise speech
Exchanging faces
Exchanging faces
Image-based faces
(lip sync.)
Stitch Background
Video Video
Model
Analysis
Synthesis Lip VideoSelect stage
stage Together
Results
• Video database
– 8 minutes of Ellen – 2 minutes of JFK
• Only half usable
• Head rotation
training video Read my lips.
I never met Forest Gump.
Light is additive
lamp #1 lamp #2
Light stage 1.0
64x32 lighting directions
Reflectance function
occlusion flare
Results
Results
Spiderman 2
real synthetic
Application: The Matrix Reloaded
Reference
• F. Pighin, J. Hecker, D. Lischinski, D. H. Salesin, and R. Szeliski.
Synthesizing realistic facial expressions from photographs.
SIGGRAPH 1998, pp75-84.
• Brian Guenter, Cindy Grimm, Henrique Malvar, Daniel Wood, Making Faces, SIGGRAPH 1998.
• Li Zhang, Noah Snavely, Brian Curless, Steven M. Seitz, Spacetime Faces: High Resolution Capture for Modeling and Animation,
SIGGRAPH 2004.
• Blanz, V. and Vetter, T., A Morphable Model for the Synthesis of 3D Faces, SIGGRAPH 1999, pp187-194.
• V. Blanz, C. Basso, T. Poggio and T. Vetter, Reanimating Faces in Images and Video, EUROGRAPHICS 2003.
• V. Blanz, K. Scherbaum, T. Vetter, H.P. Seidel, Exchanging Faces in Images, EUROGRAPHICS 2004.
• George Borshukov et al., Universal Capture - Image-based Facial Animation for "The Matrix Reloaded", SIGGRAPH 2003 Sketch.
• Paul Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley Sarokin, Mark Sagar, Acquiring the Reflectance Field of a Human Face, SIGGRAPH 2000.
• Paul Debevec, Chris Tchou, Andreas Wenger, Tim Hawkins, Andy Gardner, Brian Emerson, Ansul Panday, A Lighting Reproduction Approach to Live-Action Compositing, SIGGRAPH 2002.
• Mark Sagar, Reflectance Field Rendering of Human Faces for
"Spider-Man 2", SIGGRAPH 2004 Sketch.
• Christoph Bregler, Malcolm Slaney, Michele Covell, Video Rewrite:
Driving Visual Speeach with Audio, SIGGRAPH 1997.
• Tony Ezzat, Gadi Geiger, Tomaso Poggio, Trainable Videorealistic Speech Animation, SIGGRAPH 2002.
• Brett Allen, Brian Curless, Zoran Popovic, The Space of Human Body Shapes: Reconstruction and Parameterization From Range Scans, SIGGRAPH 2003.