### Structure from motion

Digital Visual Effectsg
*Yung-Yu Chuang*

*with slides by Richard Szeliski, Steve Seitz, Zhengyou Zhang and Marc Pollefyes*

**Outline**

• Epipolar geometry and fundamental matrix

S f i

• Structure from motion

• Factorization method

• Bundle adjustment

• Applications

• Applications

**Epipolar geometry & **

**fundamental matrix**

**The epipolar geometry**

epipolar geometry demo

*C C’ x x’* *and X are coplanar*
*C,C ,x,x* *and X are coplanar*

**The epipolar geometry**

*What if only C C’ x are known?*

*What if only C,C ,x are known?*

**The epipolar geometry**

*All points on project on l and l’*

*All points on project on l and l*

**The epipolar geometry**

*Family of planes and lines l and l’ intersect at e*
*Family of planes and lines l and l intersect at e*
*and e’*

**The epipolar geometry**

epipolar pole

= intersection of baseline with image plane

epipolar geometry demo

= intersection of baseline with image plane

= projection of projection center in other image

epipolar plane = plane containing baseline epipolar plane plane containing baseline

epipolar line = intersection of epipolar plane with image

**The fundamental matrix F**

**C** **C’**

**p** **R** **p’**

**C**

**T=C’-C**

**T** **Rp'**

**p**

Two reference frames are related via the extrinsic parameters

**T** **Rp**

**p**

**The fundamental matrix F**

**T** **Rp'**

**p**

###

^{T}_{}

^{}

^{}

^{T}^{0}

^{}

^{0}

^{T}

^{z}

^{T}^{T}

^{y}_{}

^{}

^{}

Multiply both sides by

^{p}

^{p}

^{T}

### ^{T}

^{T}

_{}

###

0 0

*x*
*y*

*x*
*z*

*T*
*T*

*T*
**T** *T*

### ^{T} ^{p} ^{p} ^{T} ^{Rp'} ^{T}

^{T}

^{p}

^{p}

^{T}

^{Rp'}

^{T}

**p**

^{T}

###

_{}

^{p} ^{p}

^{p}

^{p}

^{T}

###

_{}

### ^{p}

^{p}

**p**

_{}

_{}

### ^{T} ^{Rp'}

^{T}

^{Rp'}

### **p**

^{T}

### 0

### 0 '

**Ep**

**p**

essential matrix
### ^{T} ^{Rp}

^{T}

^{Rp}

**p**

_{}

### 0

### 0 **Ep**

**p**

**The fundamental matrix F**

### 0 '

**Ep** **p p** **p**

**Let M and M’ be the intrinsic matrices, then**

**x** **M**

**p**

^{}

^{1}

**p** ' **M** '

^{}

^{1}

**x** ' 0

### ) ' '

### ( )

### ( **M**

^{}

^{1}

**x** )

^{}

**E** ( **M**

^{}

^{1}

**x** ) 0 ( **M** **x** **E** **M** **x**

### 0 '

### '

^{}

^{1}

###

**M** **EM** **x**

**x**

### 0 '

**Fx**

**x**

fundamental matrix
**The fundamental matrix F**

• The fundamental matrix is the algebraic representation of epipolar geometry

representation of epipolar geometry

Th f d t l t i ti fi th diti

• The fundamental matrix satisfies the condition that for any pair of corresponding points x↔x’

i th t i

in the two images

### 0 Fx'

### x

^{T}

_{Fx} _{0} ^{x}

^{T}

^{l} ^{} ^{0}

### x ^{x} ^{l} ^{0}

**The fundamental matrix F**

F is the unique 3x3 rank 2 matrix that satisfies x^{T}Fx’=0
for all x↔x’

for all x↔x

**1. Transpose: if F is fundamental matrix for (P,P’), then F**^{T}
is fundamental matrix for (P’,P)

**2 Epipolar lines: l F ’ & l’ F**^{T}
**2. Epipolar lines: l=Fx’ & l’=F**^{T}x

**3. Epipoles: on all epipolar lines, thus e**^{T}Fx’=0, x’

e^{T}F=0, similarly Fe’=0

e F 0, similarly Fe 0

**4. F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2)**

**5. F is a correlation, projective mapping from a point x to **
a line l=Fx’ (not a proper correlation, i.e. not invertible)

**The fundamental matrix F**

• It can be used for

– Simplifies matching

– Allows to detect wrong matchesAllows to detect wrong matches

**Estimation of F — 8-point algorithm**

• The fundamental matrix F is defined by

### 0 '

**Fx**

**x**

**for any pair of matches x and x’ in two images.**

*f*_{11} *f*_{12} *f*_{13}

**• Let x=(u,v,1)**^{T} **and x’=(u’,v’,1)**^{T},

_{21} _{22} _{23}

13 12

11

*f*
*f*

*f*

*f*
*f*

*f*

*f*
*f*

*f*
**F**

*f*^{31} *f*^{32} *f*^{33}

each match gives a linear equation

0 '

' '

' '

' *f*_{11} *uv* *f*_{12} *uf*_{13} *vu* *f*_{21} *vv* *f*_{22} *vf*_{23} *u* *f*_{31} *v* *f*_{32} *f*_{33}
*uu*

**8-point algorithm**

11

*f*
*f*

1

´

´

´

´

´

´ ^{13}

12

1 1

1 1

1 1

1 1

1 1 1

1

*f*
*f*
*f*
*v*

*u*
*v*

*v*
*v*
*u*

*v*
*u*

*v*
*u*
*u*

*u*

1 0

´

´

´

´

´

´

22 21 2

2 2

2 2 2

2 2

2 2 2

2

1 1

1 1

1 1

1 1

1 1 1

1

*f*
*v* *f*

*u*
*v*

*v*
*v*
*u*

*v*
*u*

*v*
*u*
*u*

*u*

1

´

´

´

´

´

´

31 23

*f*

*v* *f*
*u*

*v*
*v*

*v*
*u*

*v*
*u*

*v*
*u*
*u*

*u*_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}

33 32

* f*
*f*

33

* f*

**• In reality, instead of solving , we seek f**
to minimize subj. . Find the vector

**Af** 0

**Af** _{j} **f** 1

corresponding to the least singular value.

**8-point algorithm**

• To enforce that F is of rank 2, F is replaced by F’ that minimizes subject to

**F** **F** ' **det F** ' 0

F that minimizes subject to .

**F** **F** ' det **F** ' 0

• It is achieved by SVD. Let , where ^{y}

**F** **U** Σ **V**

^{,}

^{}

let

_{1}

0 0

0 0

Σ

0 0

0 0

Σ'

1

, let

3 2

0 0

0 0

Σ

0 0

0

0 0

Σ' _{2}

then is the solution.

**F** ' **U** Σ' **V**

^{}

**8-point algorithm**

% Build the constraint matrix

A = [x2(1,:)‘.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ...[ ( , ) ( , ) ( , ) ( , ) ( , ) x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ...

x1(1,:)' x1(2,:)' ones(npts,1) ];

[U,D,V] = svd(A);

% Extract fundamental matrix from the column of V

% corresponding to the smallest singular value.p g g F = reshape(V(:,9),3,3)';

% E f k2 t i t

% Enforce rank2 constraint [U,D,V] = svd(F);

F = U*diag([D(1 1) D(2 2) 0])*V';

F = U diag([D(1,1) D(2,2) 0]) V ;

**8-point algorithm**

• Pros: it is linear, easy to implement and fast

C ibl i

• Cons: susceptible to noise

**Problem with 8-point algorithm**

11

*f*
*f*

1

´

´

´

´

´

´ ^{13}

12

1 1

1 1

1 1

1 1

1 1 1

1

*f*
*f*
*f*
*v*

*u*
*v*

*v*
*v*
*u*

*v*
*u*

*v*
*u*
*u*

*u*

1 0

´

´

´

´

´

´

22 21 2

2 2

2 2 2

2 2

2 2 2

2

1 1

1 1

1 1

1 1

1 1 1

1

*f*
*v* *f*

*u*
*v*

*v*
*v*
*u*

*v*
*u*

*v*
*u*
*u*

*u*

10000 10000 100 10000 10000 100 100 ~100 1

1

´

´

´

´

´

´

31 23

*f*

*v* *f*
*u*

*v*
*v*

*v*
*u*

*v*
*u*

*v*
*u*
*u*

*u*_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}_{n}

~10000 ~10000 ~100 ~10000 ~10000 ~100 ~100 ~100 1

**!**

Orders of magnitude difference

between column of data matrix ^{33}

32

* f*
*f*

**!** between column of data matrix

least-squares yields poor results

33

* f*

**Normalized 8-point algorithm**

1. Transform input by , 2 C ll 8 i b i

**i**

**i**

**Tx**

**x** ˆ **x** ˆ

^{'}

_{i}### **Tx**

^{'}

_{i}### ˆ

**'**

### ˆ ˆ

2. Call 8-point on to obtain 3.

**i**
**i**

**x** **x ˆ** ˆ , **T**

**F** **T**

**F** *'*

^{Τ}

### ˆ

**F**

### 0

**Fx**

**x'** **Fx** 0 **x**

**x'**

### ˆ 0 ˆ

^{}

**T** '

^{}

^{}

**FT**

^{}

^{1}

**x** **x'**

**x'**

**Fˆ**

**Normalized 8-point algorithm**

normalized least squares yields good results

T f i [ 1 1] [ 1 1]

(700,500)

(0,500) (-1,1) (1,1)

2

Transform image to ~[-1,1]x[-1,1]

( , )

( , ) ( , ) ( , )

500 1

2

1 700 0

2

(0,0)

1

(0,0) (700,0) (-1,-1) (1,-1)

**Normalized 8-point algorithm**

[x1, T1] = normalise2dpts(x1);

[x2, T2] = normalise2dpts(x2);

A = [x2(1,:)‘.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ...

x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ...

[ , ] p ( );

x1(1,:)' x1(2,:)' ones(npts,1) ];

[U D V] svd(A);

[U,D,V] = svd(A);

F = reshape(V(:,9),3,3)';

F reshape(V(:,9),3,3) ; [U,D,V] = svd(F);

F = U*diag([D(1,1) D(2,2) 0])*V';

% Denormalise F = T2'*F*T1;

**Normalization**

function [newpts, T] = normalise2dpts(pts) c = mean(pts(1:2,:)')'; % Centroid

newp(1,:) = pts(1,:)-c(1); % Shift origin to centroid.p( , ) p ( , ) ( ); g newp(2,:) = pts(2,:)-c(2);

meandist = mean(sqrt(newp(1,:).^2 + newp(2,:).^2));

scale = sqrt(2)/meandist;

T = [scale 0 -scale*c(1) 0 scale -scale*c(2) 0 0 1 ];

t T* t newpts = T*pts;

**RANSAC**

repeat

select minimal sample (8 matches) compute solution(s) for F

determine inliers

*until (#inliers,#samples)>95% or too many times *( , *p* ) y
compute F based on all inliers

**Results (ground truth)**

**Results (8-point algorithm)**

**Results (normalized 8-point algorithm)**

**Structure from motion**

**Structure from motion**

**Structure from motion**

**Unknown**
**Unknown**
**Unknown**
**Unknown**
**camera**
**camera**
**viewpoints**
**viewpoints**

**structure for motion: automatic recovery of camera motion**
**and scene structure from two or more images. It is a self **
*calibration technique and called automatic camera tracking*
*calibration technique and called automatic camera tracking*
*or matchmoving.*

**Applications**

• For computer vision, multiple-view shape reconstruction novel view synthesis and reconstruction, novel view synthesis and autonomous vehicle navigation.

F fil d ti l i ti f CGI

• For film production, seamless insertion of CGI into live-action backgrounds

**Matchmove**

**example #1** **example #2** **example #3** **example #4**

**Structure from motion**

**2D feature**

**tracking** **3D estimation** **optimization**
**(bundle adjust)**

**geometry **
**fitting**

**SFM pipeline**
**SFM pipeline**

**Structure from motion**

• Step 1: Track Features

Detect good features Shi & Tomasi SIFT – Detect good features, Shi & Tomasi, SIFT – Find correspondences between frames

• Lucas & Kanade-style motion estimationy

• window-based correlation

• SIFT matching

**KLT tracking**

http://www ces clemson edu/~stb/klt/

http://www.ces.clemson.edu/ stb/klt/

**Structure from Motion**

• Step 2: Estimate Motion and Structure

Si lifi d j ti d l **[T** **i 92]**

**– Simplified projection model, e.g., [Tomasi 92]**

**– 2 or 3 views at a time [Hartley 00]**

**Structure from Motion**

• Step 3: Refine estimates

“B dl dj t t” i h t t – “Bundle adjustment” in photogrammetry – Other iterative methods

**Structure from Motion**

• Step 4: Recover surfaces (image-based triangulation silhouettes stereo )

triangulation, silhouettes, stereo…)

**Good mesh**
**Good mesh**

**Factorization methods**

**Factorization methods**

**Problem statement**

**Notations**

*• n 3D points are seen in m views*

*(* *1) 2D i* *i*

**• q=(u,v,1): 2D image point**

**• p=(x,y,z,1): 3D scene point**

• : projection matrix

• : projection function

• : projection function

*• q*_{ij}*is the projection of the i-th point on image j*

j ti d th f

• _{ij}*projective depth of q*_{ij}

### ) ( **p**

**q**

_{ij}### (

_{j}**p**

_{i}### ) ^{}

^{(}

^{x}

^{y}

^{z}^{)}

^{}

^{(}

^{x}^{/}

^{z}

^{y}^{/}

^{z}^{)}

**q** ^{}

^{(}

^{x}^{,}

^{y}^{,}

^{z}^{)}

^{}

^{(}

^{x}^{/}

^{z}^{,}

^{y}^{/}

^{z}^{)}

*ij* *z*

###

**Structure from motion**

• Estimate and to minimize

###

*j*

**p**

*i*

) );

( ( log )

, ,

, ,

, (

1 1 1

1 *j* *i* *ij*

*m*

*j*

*n*

*i*

*ij*
*n*

*m* **p** **p** *w* *P* **Π** **p** **q**

**Π**

**Π**

###

*j*

otherwise

j in view visible

is if

0

1 _{i}

*ij*

*w* *p*

otherwise

*j* 0

• Assume isotropic Gaussian noise, it is reduced top ,

2 1

1, , , , , ) ( )

(**Π** **Π** **p** **p**

###

^{m}

^{n}*w*

_{ij}^{}

**Π**

_{j}**p**

**

_{i}**q**

_{ij} _{} _{}

1 1 1

1, , , , , ) ( )

( _{j}_{i}_{ij}

*j* *i*

*ij*
*n*

*m* **p** **p** *w* **Π** **p** **q**

**Π**

**Π**

###

• Start from a simpler projection model

• Start from a simpler projection model

**Orthographic projection**

• Special case of perspective projection

Di t f th COP t th PP i i fi it – Distance from the COP to the PP is infinite

Image World

– Also called “parallel projection”: (x, y, z) → (x, y)

**SFM under orthographic projection**

**2D image ** **Orthographic projection**

**incorporating 3D rotation** **3D scene**

**image**
**offset**
**point** **incorporating 3D rotation** **3D scene**

**point**

**offset**

**t** **Π** **t** **Πp**

**q**

1

2^{1} 23 31 21
2 23 31 21

• Trick

– Choose scene origin to be centroid of 3D pointsg p – Choose image origins to be centroid of 2D points – Allows us to drop the camera translation:Allows us to drop the camera translation:

**Πp**

**q** ** p**

**q**

**factorization (Tomasi & Kanade)**

### ^{} ^{}

**projection of n features in one image:**

### ^{} ^{}

n 3 3

n 2

2

^{n}

###

^{1}

^{2}

^{n}**2**

**1** **q** **q** **p** **p** **p**

**q**

1 1 12

11

**q** **q** **q** _{n}**Π**

**projection of n features in m images**

###

3^{2}

1 2 2

22 21

1 12

11

*n*
*n*

*n*

**p**
**p**

**Π** **p**
**q**

**q**

**q**

n 3 3

n 2m 2m

2 1

**q**^{m}**q**^{m}**q**^{mn}**Π**^{m}

3 n 2m

2m

**W**

**measurement**

**M**

^{motion}**S**

^{shape}**Key Observation: rank(W) <= 3**

**Factorization**

3 3 2

**W**

2**M** **S**

**known** **solve for**

• Factorization Technique

n 3 3

m n 2

2m

Factorization Technique

**– W is at most rank 3 (assuming no noise)**

**– We can use singular value decomposition to factor W:**

3 3

2

**W**

2**M** ' **S** '

**We can use singular value decomposition to factor W:**

n 3 3

m n 2

2m

**– S’ differs from S by a linear transformation A:**

### ) )(

### ( '

### ' **S** **MA** **AS**

**M**

**W**

^{}

^{1}**– Solve for A by enforcing metric constraints on M**

**Metric constraints**

• Orthographic Camera

R f th l ^{}

^{T}_{0}^{1} _{1}^{0}

– Rows of are orthonormal:

• Enforcing “Metric” Constraints

_{0} _{1}

**– Compute A such that rows of M have these **
properties

**M** **A**

**M '** **A** **M** **M** '

Trick (not in original Tomasi/Kanade paper, but in followup work)

**• Constraints are linear in AA**^{T} :

^{T}^{T}^{T}

*T* **A** **A** **G** *where* **G** **AA**

' ' ' '

1 0

0 1

**• Solve for G first by writing equations for every **_{i} **in M**

**• Then G = AA**^{T} **by SVD (since U = V)**

0 1

**Then G AA by SVD (since U V)**

**Factorization with noisy data**

###

### **M** **S** **E**

**W**

n 2m 3 3 n 2m n
2m

• SVD gives this solutiong

**– Provides optimal rank 3 approximation W’ of W**

### '

### **W** **E**

**W**

n 2m n 2m n
2m**W**

**W**

** E**

• Approach

**– Estimate W’, then use noise-free factorization of W’**

as before

– Result minimizes the SSD between positions of image features and projection of the reconstruction

**Results**

**Extensions to factorization methods**

• Projective projection Wi h i i d

• With missing data

• Projective projection with missing data

**Bundle adjustment**

**Bundle adjustment**

**Levenberg-Marquardt method**

• LM can be thought of as a combination of steepest descent and the Newton method steepest descent and the Newton method.

When the current solution is far from the correct one the algorithm behaves like a correct one, the algorithm behaves like a

steepest descent method: slow, but guaranteed to converge When the current solution is close to converge. When the current solution is close to the correct solution, it becomes a Newton’s method

method.

**Nonlinear least square **

### find try to

### , ts measuremen of

### set a

### Given **x**

### Here minimal

### is distance

### squared

### that the so

### vector parameter

### best

### the **p**

*T*

###

###

### ).

### ˆ ( with ˆ ,

### Here, minimal.

### is distance

### squared

**p** **x**

**x**

**x** *f*

###

###

###

**Levenberg-Marquardt method**

**Levenberg-Marquardt method**

• μ=0 → Newton’s method

d h d

• μ→∞ → steepest descent method

• Strategy for choosing μ

– Start with some small μStart with some small μ

– If error is not reduced, keep trying larger μ until it does

– If error is reduced, accept it and reduce μ for the next iteration

**Bundle adjustment**

• Bundle adjustment (BA) is a technique for simultaneously refining the 3D structure and simultaneously refining the 3D structure and camera parameters

It i bl f bt i i ti l

• It is capable of obtaining an optimal

reconstruction under certain assumptions on

i d l F G i

image error models. For zero-mean Gaussian image errors, BA is the maximum likelihood

ti t

estimator.

**Bundle adjustment**

*• n 3D points are seen in m views*

i h j *i f h i h i *i *j*

*• x*_{ij}*is the projection of the i-th point on image j*

*• a*_{j}_{j}*is the parameters for the j-th camera*

*• b*_{i}*is the parameters for the i-th point*

• BA attempts to minimize the projection error

• BA attempts to minimize the projection error

Euclidean distance

predicted projection Euclidean distance

**Bundle adjustment**

**Bundle adjustment**

3 views and 4 points

**Typical Jacobian**

**Block structure of normal equation**

**Bundle adjustment**

**Bundle adjustment**

Multiplied by

**Issues in SFM**

• Track lifetime

N li l di i

• Nonlinear lens distortion

• Degeneracy and critical surfaces

• Prior knowledge and scene constraints

• Multiple motions

• Multiple motions

**Track lifetime**

every 50th frame of a 800-frame sequencey q

**Track lifetime**

lifetime of 3192 tracks from the previous sequencep q

**Track lifetime**

track length histogramg g

**Nonlinear lens distortion**

**Nonlinear lens distortion**

effect of lens distortion

**Prior knowledge and scene constraints**

add a constraint that several lines are parallelp

**Prior knowledge and scene constraints**

add a constraint that it is a turntable sequenceq

**Applications of matchmove**

**Applications of matchmove**

**Jurassic park**

**2d3 boujou**

Enemy at the Gate, Double Negative

**2d3 boujou**

Enemy at the Gate, Double Negative

**Photo Tourism**

**VideoTrace**

http://www.acvt.com.au/research/videotrace/

**Video stabilization**

**Project #3 MatchMove**

• It is more about using tools in this project

Y h i h lib i

• You can choose either calibration or structure from motion to achieve the goal

• Calibration

• Voodoo/Icarus

• Examples from previous classes #1 #2

• Examples from previous classes, #1, #2

**References**

• Richard Hartley, In Defense of the 8-point Algorithm, ICCV, 1995.

• Carlo Tomasi and Takeo Kanade,, Shape and Motion from Image p g Streams: A Factorization Method, Proceedings of Natl. Acad. Sci., 1993.

Manolis Lourakis and Antonis Argyros The Design and

• Manolis Lourakis and Antonis Argyros, The Design and

Implementation of a Generic Sparse Bundle Adjustment Software Package Based on the Levenberg-Marquardt Algorithm, FORTH- ICS/TR 320 2004

ICS/TR-320 2004.

• N. Snavely, S. Seitz, R. Szeliski, Photo Tourism: Exploring Photo Collections in 3D, SIGGRAPH 2006.,

• A. Hengel et. al., VideoTrace: Rapid Interactive Scene Modelling from Video, SIGGRAPH 2007.