Structure from motion
Digital Visual Effects Yung-Yu Chuang
with slides by Richard Szeliski, Steve Seitz, Zhengyou Zhang and Marc Pollefyes
Outline
• Epipolar geometry and fundamental matrix
• Structure from motion
• Factorization method
• Bundle adjustment
• Applications
Epipolar geometry &
fundamental matrix
The epipolar geometry
C,C’,x,x’ and X are coplanar
epipolar geometry demo
The epipolar geometry
What if only C,C’,x are known?
The epipolar geometry
All points on project on l and l’
The epipolar geometry
Family of planes and lines l and l’ intersect at e and e’
The epipolar geometry
epipolar plane = plane containing baseline
epipolar line = intersection of epipolar plane with image epipolar pole
= intersection of baseline with image plane
= projection of projection center in the other image
epipolar geometry demo
The fundamental matrix F
C C’
T=C’-C
p R p’
T Rp'
p
Two reference frames are related via the extrinsic parameters
The fundamental matrix F
0 '
Ep
p
essential matrixT Rp'
p
0 0
0
x y
x z
y z
T T
T T
T T
T
Multiply both sides by
p
T T
T p p T Rp' T
p
T
T
T Rp'
p
T0
The fundamental matrix F
0 '
Ep p
Let M and M’ be the intrinsic matrices, then
x M
p
1p ' M '
1x ' 0
) ' '
( )
( M
1x
E M
1x 0 '
'
1
M EM x
x
0 '
Fx
x
fundamental matrixThe fundamental matrix F
• The fundamental matrix is the algebraic representation of epipolar geometry
• The fundamental matrix satisfies the condition that for any pair of corresponding points x↔x’
in the two images
0 Fx'
x
T x
Tl 0
F is the unique 3x3 rank 2 matrix that satisfies xTFx’=0 for all x↔x’
1. Transpose: if F is fundamental matrix for (x,x’), then FT is fundamental matrix for (x’,x)
2. Epipolar lines: l=Fx’ & l’=FTx
3. Epipoles: on all epipolar lines, thus eTFx’=0, x’
eTF=0, similarly Fe’=0
4. F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2)
5. F is a correlation, projective mapping from a point x to a line l=Fx’ (not a proper correlation, i.e. not invertible)
The fundamental matrix F
The fundamental matrix F
• It can be used for
– Simplifies matching
– Allows to detect wrong matches
Estimation of F — 8-point algorithm
• The fundamental matrix F is defined by
0 '
Fx x
for any pair of matches x and x’ in two images.
• Let x=(u,v,1)T and x’=(u’,v’,1)T,
33 32
31
23 22
21
13 12
11
f f
f
f f
f
f f
f F
each match gives a linear equation
0 '
' '
' '
' f11 uv f12 uf13 vu f21 vv f22 vf23 u f31 v f32 f33 uu
8-point algorithm
0 1
´
´
´
´
´
´
1
´
´
´
´
´
´
1
´
´
´
´
´
´
33 32 31 23 22 21 13 12 11
2 2
2 2
2 2
2 2
2 2 2
2
1 1
1 1
1 1
1 1
1 1 1
1
f f f f f f f f f
v u
v v
v u
v u
v u u
u
v u
v v
v u
v u
v u u
u
v u
v v
v u
v u
v u u
u
n n
n n
n n
n n
n n n
n
• In reality, instead of solving , we seek f to minimize subj. . Find the vector corresponding to the least singular value.
0
Af f Af 1
8-point algorithm
• To enforce that F is of rank 2, F is replaced by F’
that minimizes subject to .
F F ' det F ' 0
• It is achieved by SVD. Let , where , let
then is the solution.
U V
F Σ
3 2
1
0 0
0 0
0 0
Σ
0 0
0
0 0
0 0
Σ' 2
1
U V
F ' Σ'
8-point algorithm
% Build the constraint matrix
A = [x2(1,:)'.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ...
x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ...
x1(1,:)' x1(2,:)' ones(npts,1) ];
[U,D,V] = svd(A);
% Extract fundamental matrix from the column of V
% corresponding to the smallest singular value.
F = reshape(V(:,9),3,3)';
% Enforce rank2 constraint [U,D,V] = svd(F);
F = U*diag([D(1,1) D(2,2) 0])*V';
8-point algorithm
• Pros: it is linear, easy to implement and fast
• Cons: susceptible to noise
Problem with 8-point algorithm
~10000 ~10000 ~100 ~10000 ~10000 ~100 ~100 ~100 1
!
Orders of magnitude difference between column of data matrix
least-squares yields poor results
0 1
´
´
´
´
´
´
1
´
´
´
´
´
´
1
´
´
´
´
´
´
33 32 31 23 22 21 13 12 11
2 2
2 2
2 2
2 2
2 2 2
2
1 1
1 1
1 1
1 1
1 1 1
1
f f f f f f f f f
v u
v v
v u
v u
v u u
u
v u
v v
v u
v u
v u u
u
v u
v v
v u
v u
v u u
u
n n
n n
n n
n n
n n n
n
Normalized 8-point algorithm
1. Transform input by , 2. Call 8-point on to obtain 3.
i
i
Tx
x ˆ x ˆ
'i T
'x
'i' i
i
x
x ˆ ˆ , T
F T
F '
Τˆ
Fˆ
0
Fx x'
ˆ 0 ˆ
T '
FT
1x x'
Fˆ
Normalized 8-point algorithm
(0,0)
(700,500)
(700,0) (0,500)
(1,-1) (0,0)
(1,1) (-1,1)
(-1,-1)
1 500 1
2
1 700 0
2
normalized least squares yields good results Transform image to ~[-1,1]x[-1,1]
Normalized 8-point algorithm
A = [x2(1,:)‘.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ...
x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ...
x1(1,:)' x1(2,:)' ones(npts,1) ];
[U,D,V] = svd(A);
F = reshape(V(:,9),3,3)';
[U,D,V] = svd(F);
F = U*diag([D(1,1) D(2,2) 0])*V';
% Denormalise F = T2'*F*T1;
[x1, T1] = normalise2dpts(x1);
[x2, T2] = normalise2dpts(x2);
Normalization
function [newpts, T] = normalise2dpts(pts) c = mean(pts(1:2,:)')'; % Centroid
newp(1,:) = pts(1,:)-c(1); % Shift origin to centroid.
newp(2,:) = pts(2,:)-c(2);
meandist = mean(sqrt(newp(1,:).^2 + newp(2,:).^2));
scale = sqrt(2)/meandist;
T = [scale 0 -scale*c(1) 0 scale -scale*c(2) 0 0 1 ];
newpts = T*pts;
RANSAC
repeat
select minimal sample (8 matches) compute solution(s) for F
determine inliers
until (#inliers,#samples)>95% or too many times compute F based on all inliers
Results (ground truth)
Results (8-point algorithm)
Results (normalized 8-point algorithm)
Structure from motion
Structure from motion
structure for motion: automatic recovery of camera motion and scene structure from two or more images. It is a self calibration technique and called automatic camera tracking or matchmoving.
Unknown camera viewpoints
Applications
• For computer vision, multiple-view shape reconstruction, novel view synthesis and autonomous vehicle navigation.
• For film production, seamless insertion of CGI into live-action backgrounds
Matchmove
example #1 example #2 example #3
Structure from motion
2D feature
tracking 3D estimation optimization (bundle adjust)
geometry fitting
SFM pipeline
Structure from motion
• Step 1: Track Features
– Detect good features, Shi & Tomasi, SIFT – Find correspondences between frames
• Lucas & Kanade-style motion estimation
• window-based correlation
• SIFT matching
KLT tracking
http://www.ces.clemson.edu/~stb/klt/
Structure from Motion
• Step 2: Estimate Motion and Structure
– Simplified projection model, e.g., [Tomasi 92]
– 2 or 3 views at a time [Hartley 00]
Structure from Motion
• Step 3: Refine estimates
– “Bundle adjustment” in photogrammetry – Other iterative methods
Structure from Motion
• Step 4: Recover surfaces (image-based triangulation, silhouettes, stereo…)
Good mesh
Factorization methods
Problem statement
Notations
• n 3D points are seen in m views
• q=(u,v,1): 2D image point
• p=(x,y,z,1): 3D scene point
• : projection matrix
• : projection function
• qij is the projection of the i-th point on image j
• ij projective depth of qij
) (
j iij
p
q ( x , y , z ) ( x / z , y / z )
ij
z
Structure from motion
• Estimate and to minimize
) );
( ( log
) ,
, ,
, ,
(
1 1 1
1 j i ij
m
j
n
i
ij n
m p p w P Π p q
Π
Π
otherwise
j in view visible
is if
0
1 i
ij
w p
• Assume isotropic Gaussian noise, it is reduced to
2 1 1
1
1, , , , , ) ( )
( m j i ij
j
n
i
ij n
m p p w Π p q
Π
Π
jp
i• Start from a simpler projection model
Orthographic projection
• Special case of perspective projection
– Distance from the COP to the PP is infinite
– Also called “parallel projection”: (x, y, z) → (x, y)
Image World
SFM under orthographic projection
2D image point
Orthographic projection
incorporating 3D rotation 3D scene point
image offset
t Πp
q
1
2 23 31 21
• Trick
– Choose scene origin to be centroid of 3D points – Choose image origins to be centroid of 2D points – Allows us to drop the camera translation:
Πp
q
factorization (Tomasi & Kanade)
n 3 3
n 2
2
n
1 2 n2
1 q q p p p
q
projection of n features in one image:
n 3 3
n 2m 2m
2 1
2 1
2 1
2 22
21
1 12
11
n
mn m m
m
n n
p p
p Π
Π Π
q q
q
q q
q
q q
q
projection of n features in m images
W
measurementM
motionS
shapeKey Observation: rank(W) <= 3
n 3 3
m n 2
2m
' '
M S
W
• Factorization Technique
– W is at most rank 3 (assuming no noise)
– We can use singular value decomposition to factor W:
Factorization
– S’ differs from S by a linear transformation A:
– Solve for A by enforcing metric constraints on M
) )(
( '
' S MA AS
M
W
1n 3 3
m n 2
2m
W
M
S
known solve for
Results
Extensions to factorization methods
• Projective projection
• With missing data
• Projective projection with missing data
Bundle adjustment
Bundle adjustment
• n 3D points are seen in m views
• xij is the projection of the i-th point on image j
• aj is the parameters for the j-th camera
• bi is the parameters for the i-th point
• BA attempts to minimize the projection error
Euclidean distance
predicted projection
Levenberg-Marquardt method
• LM can be thought of as a combination of steepest descent and the Newton method.
When the current solution is far from the correct one, the algorithm behaves like a
steepest descent method: slow, but guaranteed to converge. When the current solution is close to the correct solution, it becomes a Newton’s method.
Bundle adjustment
MatchMove
Applications of matchmove
Jurassic park
2d3 boujou
Enemy at the Gate, Double Negative
2d3 boujou
Enemy at the Gate, Double Negative
Photo Tourism
VideoTrace
http://www.acvt.com.au/research/videotrace/
Video stabilization
References
• Richard Hartley, In Defense of the 8-point Algorithm, ICCV, 1995.
• Carlo Tomasi and Takeo Kanade, Shape and Motion from Image Streams: A Factorization Method, Proceedings of Natl. Acad. Sci., 1993.
• Manolis Lourakis and Antonis Argyros, The Design and
Implementation of a Generic Sparse Bundle Adjustment Software Package Based on the Levenberg-Marquardt Algorithm, FORTH- ICS/TR-320 2004.
• N. Snavely, S. Seitz, R. Szeliski, Photo Tourism: Exploring Photo Collections in 3D, SIGGRAPH 2006.
• A. Hengel et. al., VideoTrace: Rapid Interactive Scene Modelling from Video, SIGGRAPH 2007.
Project #3 MatchMove
• It is more about using tools in this project
• You can choose either calibration or structure from motion to achieve the goal
• Calibration
• Voodoo/Icarus
• Examples from previous classes, #1, #2
• https://www.youtube.com/user/theActionMovieKid/videos