3D photography
Digital Visual Effects, Spring 2005 Yung-Yu Chuang
2005/5/18
with slides by Szymon Rusinkiewicz, Richard Szeliski, Steve Seitz and Brian Curless
Announcements
• Project #2 winning artifacts
• Project #3 is due next Tuesday
• CGCG talk on 5/23, 2:20pm, CSIE 107
3D photography
• Acquisition of geometry and material
Range acquisition
Range acquisition taxonomy
range range acquisition acquisition
contact contact
transmissive transmissive
reflective reflective
nonnon--opticaloptical
optical optical
industrial CT industrial CT mechanical
mechanical (CMM, jointed arm)(CMM, jointed arm)
radar radar sonar sonar ultrasound
ultrasound MRI MRI
ultrasonic trackers ultrasonic trackers magnetic trackers magnetic trackers inertial
inertial (gyroscope, accelerometer)(gyroscope, accelerometer)
Touch Probes
• Jointed arms with angular encoders
• Return position, orientation of tip
Faro Arm
Faro Arm ––Faro Technologies, Inc.Faro Technologies, Inc.
Range acquisition taxonomy
optical optical methods methods
passive passive
active active
shape from X:
shape from X:
stereo stereo motion motion shading shading texture texture focus focus defocus defocus
active variants of passive methods active variants of passive methods
Stereo w. projected texture Stereo w. projected texture Active depth from defocus Active depth from defocus Photometric stereo Photometric stereo
time of flight time of flight
triangulation triangulation
Outline
• Passive approaches
– Stereo
– Multiview approach
• Active approaches
– Triangulation – Shadow scanning
• Active variants of passive approaches
– Photometric stereo
– Example-based photometric stereo – Helmholtz stereo
Passive approaches
Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923
Stereo
Stereo
• One distinguishable point being observed
– The preimage can be found at the intersection of the rays from the focal points to the image points
Stereo
• Many points being observed
– Need some method to establish correspondences
Components of stereo vision systems
• Camera calibration: previous lecture
• Image rectification: simplifies the search for correspondences
• Correspondence: which item in the left image corresponds to which item in the right image
• Reconstruction: recovers 3-D information from the 2-D correspondences
Epipolar geometry
• Epipolar constraint: corresponding points must lie on conjugate epipolar lines
– Search for correspondences becomes a 1-D problem
Image rectification
• Warp images such that conjugate
epipolar lines become collinear and parallel to u axis
Disparity
• With rectified images, disparity is just (horizontal) displacement of corresponding features in the two images
– Disparity = 0 for distant points – Larger disparity for closer points
– Depth of point proportional to 1/disparity
Reconstruction
• Geometric
– Construct the line segment perpendicular to R and R' that intersects both rays and take its mid-point
Basic stereo algorithm
For each epipolar line
For each pixel in the left image
• compare with every pixel on same epipolar line in right image
• pick pixel with minimum match cost
Improvement: match windows
Basic stereo algorithm
• For each pixel
– For each disparity
• For each pixel in window – Compute difference
– Find disparity with minimum SSD
Reverse order of loops
• For each disparity
– For each pixel
• For each pixel in window – Compute difference
• Find disparity with minimum SSD at each pixel
Incremental computation
• Given SSD of a window, at some disparity
Image 1 Image 1
Image 2 Image 2
Incremental computation
• Want: SSD at next location
Image 1 Image 1
Image 2 Image 2
Incremental computation
• Subtract contributions from leftmost column, add contributions from rightmost column
Image 1 Image 1
Image 2 Image 2
+ + + + + + + + + +
- - - - - - - - - - - - - - - - - - - -
+ + + + + + + + + +
Selecting window size
• Small window: more detail, but more noise
• Large window: more robustness, less detail
• Example:
Selecting window size
3 pixel window
3 pixel window 20 pixel window20 pixel window
Non-square windows
• Compromise: have a large window, but higher weight near the center
• Example: Gaussian
• For each disparity
– For each pixel
• Compute weighted SSD
Ordering constraint
• Order of matching features usually the same in both images
• But not always: occlusion
Dynamic programming
• Treat feature correspondence as graph problem
Left image Left image features features
Right image features Right image features
1
1 22 33 44 1
1 2 2 3 3 4 4
Cost of edges = Cost of edges =
similarity of similarity of regions between regions between image features image features
Dynamic programming
• Find min-cost path through graph
Left image Left image features features
Right image features Right image features
11 22 33 44 11
22 33 44
11
33 44 11 22 22
3 3 44
Energy minimization
• Another approach to improve quality of correspondences
• Assumption: disparities vary (mostly) smoothly
• Minimize energy function:
Edata+λEsmoothness
• Edata: how well does disparity match data
• Esmoothness: how well does disparity match that of neighbors – regularization
Energy minimization
• If data and energy terms are nice (continuous, smooth, etc.) can try to minimize via gradient descent, etc.
• In practice, disparities only piecewise smooth
• Design smoothness function that doesn’t penalize large jumps too much
– Example: V(α,β)=min(|α−β|, K)
Stereo as energy minimization
• Matching Cost Formulated as Energy
– “data” term penalizing bad matches
– “neighborhood term” encouraging spatial smoothness
) , (
) , ( ) , ,
(x y d x y x d y
D = I −J +
similar) something
(or
d2 and d1 labels with pixels adjacent of
cost
2 1 2 1, ) (
d d d d V
−
=
=
∑
∑
+=
) 2 , 2 ( ), 1 , 1 (
2 , 2 1 , 1 )
, (
, ) ( , )
, , (
y x y x neighbors
y x y x y
x
y
x V d d
d y x D E
Energy minimization
• Hard to find global minima of non-smooth functions
– Many local minima – Provably NP-hard
• Practical algorithms look for approximate minima (e.g., simulated annealing)
Energy minimization via graph cuts
Labels (disparities)
d1 d2 d3
edge weight
edge weight ) , , (x y d3 D
) , (d1 d1 V
• Graph Cost
– Matching cost between images – Neighborhood matching term
– Goal: figure out which labels are connected to which pixels
d1 d2 d3
Energy minimization via graph cuts Energy minimization via graph cuts
d1 d2 d3
• Graph Cut
– Delete enough edges so that
• each pixel is (transitively) connected to exactly one label node
– Cost of a cut: sum of deleted edge weights – Finding min cost cut equivalent to finding global
minimum of energy function
Computing a multiway cut
• With 2 labels: classical min-cut problem
– Solvable by standard flow algorithms
• polynomial time in theory, nearly linear in practice
– More than 2 terminals: NP-hard
[Dahlhaus et al., STOC ‘92]
• Efficient approximation algorithms exist
– Within a factor of 2 of optimal
– Computes local minimum in a strong sense
• even very large moves will not improve the energy
– Yuri Boykov, Olga Veksler and Ramin Zabih, Fast Approximate Energy Minimization via Graph Cuts, International Conference on Computer Vision, September 1999.
Move examples
Starting point
Red-blue swap move
Green expansion move
The swap move algorithm
1. Start with an arbitrary labeling
2. Cycle through every label pair (A,B) in some order
2.1 Find the lowest E labeling within a single AB-swap 2.2 Go there if it’s lower E than the current labeling
3. If E did not decrease in the cycle, we’re done Otherwise, go to step 2
Original graph
A B
AB subgraph
(run min-cut on this graph) B
A
The expansion move algorithm
1. Start with an arbitrary labeling
2. Cycle through every label A in some order
2.1 Find the lowest E labeling within a single A-expansion 2.2 Go there if it’s lower E than the current labeling
3. If E did not decrease in the cycle, we’re done Otherwise, go to step 2
Stereo results
ground truth scene
– Data from University of Tsukuba
Results with window correlation
normalized correlation (best window size)
ground truth
Results with graph cuts
ground truth graph cuts
(Potts model E, expansion move algorithm)
Volumetric multiview approaches
• Goal: find a model consistent with images
• “Model-centric” (vs. image-centric)
• Typically use discretized volume (voxel grid)
• For each voxel, compute occupied / free (for some algorithms, also color, etc.)
Photo consistency
• Result: not necessarily correct scene
• Many scenes produce the same images
All scenes All scenes
Photo
Photo--consistent scenesconsistent scenes True scene
True scene
Reconstructed Reconstructed
scene scene
Silhouette carving
• Find silhouettes in all images
• Exact version:
– Back-project all silhouettes, find intersection
Binary Images Binary Images
Silhouette carving
• Find silhouettes in all images
• Exact version:
– Back-project all silhouettes, find intersection
Silhouette carving
• Limit of silhouette carving is visual hull or line hull
• Complement of lines that don’t intersect object
• In general not the same as object
– Can’t recover “pits” in object
• Not the same as convex hull
Silhouette carving
• Discrete version:
– Loop over all voxels in some volume
– If projection into images lies inside all silhouettes, mark as occupied
– Else mark as free
Silhouette carving
Voxel coloring
• Seitz and Dyer, 1997
• In addition to free / occupied, store color at each voxel
• Explicitly accounts for occlusion
Voxel coloring
• Basic idea: sweep through a voxel grid
– Project each voxel into each image in which it is visible
– If colors in images agree, mark voxel with color – Else, mark voxel as empty
• Agreement of colors based on comparing standard deviation of colors to threshold
Voxel coloring and occlusion
• Problem: which voxels are visible?
• Solution: constrain camera views
– When a voxel is considered, necessary occlusion information must be available
– Sweep occluders before occludees
– Constrain camera positions to allow this sweep
Voxel coloring sweep order
Layers Layers
Scene Scene Traversal Traversal
Voxel coloring camera positions
Inward-looking
Cameras above scene
Outward-looking
Cameras inside scene
Seitz Seitz
Image acquisition
•Calibrated Turntable
•360° rotation (21 images) Selected Dinosaur Images
Selected Dinosaur Images
Selected Flower Images Selected Flower Images
Voxel coloring results
Dinosaur Reconstruction Dinosaur Reconstruction
72 K voxels colored 72 K voxels colored 7.6 M voxels tested 7.6 M voxels tested 7 min. to compute 7 min. to compute on a 250MHz SGI on a 250MHz SGI
Flower Reconstruction Flower Reconstruction
70 K voxels colored 70 K voxels colored 7.6 M voxels tested 7.6 M voxels tested 7 min. to compute 7 min. to compute on a 250MHz SGI on a 250MHz SGI
Limitations of voxel coloring
• A view-independent depth order may not exist
• Need more powerful general-case algorithms
– Unconstrained camera positions
– Unconstrained scene geometry/topology pp qq
Space carving
Image 1
Image 1 Image NImage N
……......
Initialize to a volume V containing the true scene Initialize to a volume V containing the true scene
Repeat until convergence Repeat until convergence
Choose a voxel on the current surface Choose a voxel on the current surface
Carve if not photo
Carve if not photo--consistentconsistent Project to visible input images Project to visible input images
Multi-pass plane sweep
• Faster alternative:
– Sweep plane in each of 6 principal directions – Consider cameras on only one side of plane – Repeat until convergence
Multi-pass plane sweep
True Scene Reconstruction
Multi-pass plane sweep
Multi-pass plane sweep Multi-pass plane sweep
Multi-pass plane sweep
Input image (1 of 45)
Input image (1 of 45) ReconstructionReconstruction
Reconstruction Reconstruction Reconstruction
Reconstruction
Space carving results: African violet
Input image Input image (1 of 100) (1 of 100)
Reconstruction Reconstruction
Space carving results: hand
Active approaches
Time of flight
• Basic idea: send out pulse of light (usually laser), time how long it takes to return
t c
r= Δ
2 1c t
r= Δ
2 1
Laser scanning (triangulation)
• Optical triangulation
– Project a single stripe of laser light – Scan it across the surface of the object
– This is a very precise version of structured light scanning
• Other patterns are possible
Direction of travel Object
CCD
CCD image plane
Laser
Cylindrical lens Laser sheet
Digital Michelangelo Project
http://graphics.stanford.edu/projects/mich/
Shadow scanning
Desk Lamp
Camera
Stick or pencil
Desk
http://www.vision.caltech.edu/bouguetj/ICCV98/
Basic idea
• Calibration issues:
– where’s the camera wrt. ground plane?
– where’s the shadow plane?
• depends on light source position, shadow edge
Two Plane Version
• Advantages
– don’t need to pre-calibrate the light source
– shadow plane determined from two shadow edges
Estimating shadow lines
Shadow scanning in action
accuracy: 0.1mm over 10cm ~ 0.1% error
Results
Textured objects
accuracy: 1mm over 50cm ~ 0.5% error
Scanning with the sun
accuracy: 1cm over 2m
~ 0.5% error
Scanning with the sun
Active variants of
passive approaches
The BRDF
• The Bidirectional Reflection Distribution Function
– Given an incoming ray and outgoing ray what proportion of the incoming light is reflected along outgoing ray?
surface normal
γ ρ(l,v) (l n)
I = × ⋅
Diffuse reflection (Lambertian)
L V P N
kd
v l, )=
ρ( albedo
Assuming that light strength is 1.
Photometric stereo
N L1
L2
V L3
• Can write this as a matrix equation:
Solving the equations
More than three lights
• Get better results by using more lights
• Least squares solution:
• Solve for N, kd as before
Trick for handling shadows
• Weight each equation by the pixel brightness:
• Gives weighted least-squares matrix equation:
• Solve for N, kdas before
Photometric Stereo Setup Procedure
• Calibrate camera
• Calibrate light directions/intensities
• Photographing objects (HDR recommended)
• Estimate normals
• Estimate depth
Estimating light directions
• Trick: place a chrome sphere in the scene
– the location of the highlight tells you where the light source is
• Use a ruler
Photographing objects
Normalize light intensities Estimate normals
Depth from normals Results
Limitations
• Big problems
– doesn’t work for shiny things, semi-translucent things
– shadows, inter-reflections
• Smaller problems
– calibration requirements
• measure light source directions, intensities
• camera response function
Example-based photometric stereo
• Estimate 3D shape by varying illumination, fixed camera
• Operating conditions
– any opaque material – distant camera, lighting – reference object available
– no shadows, interreflections, transparency
same surface normal
“Orientation consistency”
Virtual views
Velvet Virtual Views
Brushed Fur Virtual Views
Helmholtz Stereo
• Based on Helmholtz reciprocity: surface reflectance is the same under interchange of light, viewer
• So, take pairs of observations w. viewer, light interchanged
• Ratio of the observations in a pair is independent of surface material
Helmholtz Stereo Helmholtz Stereo
Helmholtz Stereo Reference
• D. Scharstein and R. Szeliski. A Texonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithm, IJCV 2002.
• S. Seitz and C. Dyer. Photorealistic Scene Reconstruction by Voxel Coloring, CVPR 1997.
• J.-Y. Bouguet and P. Perona. 3D Photography on Your Desk, ICCV 1998.
• T. Zickler, P. Belhumeur and D. Kriegman. Helmholtz Stereopsis:
Exploiting Reciprocity for Surface Reconstruction, ECCV 2002.
• A. Hertzman and S. Seitz. Shape and Materials by Example: A Photometric Stereo Appraoch, CVPR 2003.