# fundamental matrix

(1)

### Structure from motion

Digital Visual Effects Yung-Yu Chuang

with slides by Richard Szeliski, Steve Seitz, Zhengyou Zhang and Marc Pollefyes

(2)

### Outline

• Epipolar geometry and fundamental matrix

• Structure from motion

• Factorization method

• Applications

(3)

(4)

### The epipolar geometry

C,C’,x,x’ and X are coplanar

epipolar geometry demo

(5)

### The epipolar geometry

What if only C,C’,x are known?

(6)

### The epipolar geometry

All points on  project on l and l’

(7)

### The epipolar geometry

Family of planes  and lines l and l’ intersect at e and e’

(8)

### The epipolar geometry

epipolar plane = plane containing baseline

epipolar line = intersection of epipolar plane with image epipolar pole

= intersection of baseline with image plane

= projection of projection center in the other image

epipolar geometry demo

(9)

C C’

T=C’-C

p R p’

### p  

Two reference frames are related via the extrinsic parameters

(10)

essential matrix

###  





0 0

0

x y

x z

y z

T T

T T

T T

T

Multiply both sides by

T

T

T

T

(11)

### Epp

Let M and M’ be the intrinsic matrices, then

1

1

1

1

1

### x

fundamental matrix

(12)

### The fundamental matrix F

• The fundamental matrix is the algebraic representation of epipolar geometry

• The fundamental matrix satisfies the condition that for any pair of corresponding points x↔x’

in the two images

T

T

### l0 

(13)

F is the unique 3x3 rank 2 matrix that satisfies xTFx’=0 for all x↔x’

1. Transpose: if F is fundamental matrix for (x,x’), then FT is fundamental matrix for (x’,x)

2. Epipolar lines: l=Fx’ & l’=FTx

3. Epipoles: on all epipolar lines, thus eTFx’=0, x’

eTF=0, similarly Fe’=0

4. F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2)

5. F is a correlation, projective mapping from a point x to a line l=Fx’ (not a proper correlation, i.e. not invertible)

(14)

### The fundamental matrix F

• It can be used for

– Simplifies matching

– Allows to detect wrong matches

(15)

### Estimation of F — 8-point algorithm

• The fundamental matrix F is defined by

### Fxx

for any pair of matches x and x’ in two images.

• Let x=(u,v,1)T and x’=(u’,v’,1)T,

33 32

31

23 22

21

13 12

11

f f

f

f f

f

f f

f F

each match gives a linear equation

0 '

' '

' '

' f11 uv f12 uf13 vu f21 vv f22 vf23 u f31 v f32 f33 uu

(16)

### 8-point algorithm

0 1

´

´

´

´

´

´

1

´

´

´

´

´

´

1

´

´

´

´

´

´

33 32 31 23 22 21 13 12 11

2 2

2 2

2 2

2 2

2 2 2

2

1 1

1 1

1 1

1 1

1 1 1

1

f f f f f f f f f

v u

v v

v u

v u

v u u

u

v u

v v

v u

v u

v u u

u

v u

v v

v u

v u

v u u

u

n n

n n

n n

n n

n n n

n

• In reality, instead of solving , we seek f to minimize subj. . Find the vector corresponding to the least singular value.

(17)

### 8-point algorithm

• To enforce that F is of rank 2, F is replaced by F’

that minimizes subject to .

### F  F ' det F '  0

• It is achieved by SVD. Let , where , let

then is the solution.

3 2

1

0 0

0 0

0 0

Σ

0 0

0

0 0

0 0

Σ' 2

1

(18)

### 8-point algorithm

% Build the constraint matrix

A = [x2(1,:)'.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ...

x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ...

x1(1,:)' x1(2,:)' ones(npts,1) ];

[U,D,V] = svd(A);

% Extract fundamental matrix from the column of V

% corresponding to the smallest singular value.

F = reshape(V(:,9),3,3)';

% Enforce rank2 constraint [U,D,V] = svd(F);

F = U*diag([D(1,1) D(2,2) 0])*V';

(19)

### 8-point algorithm

• Pros: it is linear, easy to implement and fast

• Cons: susceptible to noise

(20)

### Problem with 8-point algorithm

~10000 ~10000 ~100 ~10000 ~10000 ~100 ~100 ~100 1

!

Orders of magnitude difference between column of data matrix

 least-squares yields poor results

0 1

´

´

´

´

´

´

1

´

´

´

´

´

´

1

´

´

´

´

´

´

33 32 31 23 22 21 13 12 11

2 2

2 2

2 2

2 2

2 2 2

2

1 1

1 1

1 1

1 1

1 1 1

1

f f f f f f f f f

v u

v v

v u

v u

v u u

u

v u

v v

v u

v u

v u u

u

v u

v v

v u

v u

v u u

u

n n

n n

n n

n n

n n n

n

(21)

### Normalized 8-point algorithm

1. Transform input by , 2. Call 8-point on to obtain 3.

i

i

'i

'

'i

' i

i

Τ

1

(22)

### Normalized 8-point algorithm

(0,0)

(700,500)

(700,0) (0,500)

(1,-1) (0,0)

(1,1) (-1,1)

(-1,-1)

1 500 1

2

1 700 0

2

normalized least squares yields good results Transform image to ~[-1,1]x[-1,1]

(23)

### Normalized 8-point algorithm

A = [x2(1,:)‘.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ...

x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ...

x1(1,:)' x1(2,:)' ones(npts,1) ];

[U,D,V] = svd(A);

F = reshape(V(:,9),3,3)';

[U,D,V] = svd(F);

F = U*diag([D(1,1) D(2,2) 0])*V';

% Denormalise F = T2'*F*T1;

[x1, T1] = normalise2dpts(x1);

[x2, T2] = normalise2dpts(x2);

(24)

### Normalization

function [newpts, T] = normalise2dpts(pts) c = mean(pts(1:2,:)')'; % Centroid

newp(1,:) = pts(1,:)-c(1); % Shift origin to centroid.

newp(2,:) = pts(2,:)-c(2);

meandist = mean(sqrt(newp(1,:).^2 + newp(2,:).^2));

scale = sqrt(2)/meandist;

T = [scale 0 -scale*c(1) 0 scale -scale*c(2) 0 0 1 ];

newpts = T*pts;

(25)

### RANSAC

repeat

select minimal sample (8 matches) compute solution(s) for F

determine inliers

until (#inliers,#samples)>95% or too many times compute F based on all inliers

(26)

(27)

(28)

(29)

(30)

### Structure from motion

structure for motion: automatic recovery of camera motion and scene structure from two or more images. It is a self calibration technique and called automatic camera tracking or matchmoving.

Unknown camera viewpoints

(31)

### Applications

• For computer vision, multiple-view shape reconstruction, novel view synthesis and autonomous vehicle navigation.

• For film production, seamless insertion of CGI into live-action backgrounds

(32)

### Matchmove

example #1 example #2 example #3

(33)

### Structure from motion

2D feature

tracking 3D estimation optimization (bundle adjust)

geometry fitting

SFM pipeline

(34)

### Structure from motion

• Step 1: Track Features

– Detect good features, Shi & Tomasi, SIFT – Find correspondences between frames

• Lucas & Kanade-style motion estimation

• window-based correlation

• SIFT matching

(35)

### KLT tracking

http://www.ces.clemson.edu/~stb/klt/

(36)

### Structure from Motion

• Step 2: Estimate Motion and Structure

– Simplified projection model, e.g., [Tomasi 92]

– 2 or 3 views at a time [Hartley 00]

(37)

### Structure from Motion

• Step 3: Refine estimates

– “Bundle adjustment” in photogrammetry – Other iterative methods

(38)

### Structure from Motion

• Step 4: Recover surfaces (image-based triangulation, silhouettes, stereo…)

Good mesh

(39)

(40)

(41)

### Notations

• n 3D points are seen in m views

• q=(u,v,1): 2D image point

• p=(x,y,z,1): 3D scene point

• : projection matrix

• : projection function

• qij is the projection of the i-th point on image j

• ij projective depth of qij

j i

ij

ij

(42)

### Structure from motion

• Estimate and to minimize

) );

( ( log

) ,

, ,

, ,

(

1 1 1

1 j i ij

m

j

n

i

ij n

m p p w P Π p q

Π

Π

### 

 

otherwise

j in view visible

is if

0

1 i

ij

w p



 

• Assume isotropic Gaussian noise, it is reduced to

2 1 1

1

1, , , , , ) ( )

( m j i ij

j

n

i

ij n

m p p w Π p q

Π

Π

j

### p

i

• Start from a simpler projection model

(43)

### Orthographic projection

• Special case of perspective projection

– Distance from the COP to the PP is infinite

– Also called “parallel projection”: (x, y, z) → (x, y)

Image World

(44)

### SFM under orthographic projection

2D image point

Orthographic projection

incorporating 3D rotation 3D scene point

image offset

### q  

1

2 23 31 21

• Trick

– Choose scene origin to be centroid of 3D points – Choose image origins to be centroid of 2D points – Allows us to drop the camera translation:

(45)

n 3 3

n 2

2

n

### 

1 2 n

2

1 q q p p p

q

projection of n features in one image:

###  

n 3 3

n 2m 2m

2 1

2 1

2 1

2 22

21

1 12

11

n

mn m m

m

n n

p p

p Π

Π Π

q q

q

q q

q

q q

q

projection of n features in m images

measurement

motion

### S

shape

Key Observation: rank(W) <= 3

(46)

n 3 3

m n 2

2m

### W

• Factorization Technique

– W is at most rank 3 (assuming no noise)

– We can use singular value decomposition to factor W:

### Factorization

– S’ differs from S by a linear transformation A:

– Solve for A by enforcing metric constraints on M

1

n 3 3

m n 2

2m

known solve for

(47)

(48)

### Extensions to factorization methods

• Projective projection

• With missing data

• Projective projection with missing data

(49)

(50)

• n 3D points are seen in m views

• xij is the projection of the i-th point on image j

• aj is the parameters for the j-th camera

• bi is the parameters for the i-th point

• BA attempts to minimize the projection error

Euclidean distance

predicted projection

(51)

### Levenberg-Marquardt method

• LM can be thought of as a combination of steepest descent and the Newton method.

When the current solution is far from the correct one, the algorithm behaves like a

steepest descent method: slow, but guaranteed to converge. When the current solution is close to the correct solution, it becomes a Newton’s method.

(52)

(53)

(54)

(55)

(56)

### 2d3 boujou

Enemy at the Gate, Double Negative

(57)

### 2d3 boujou

Enemy at the Gate, Double Negative

(58)

(59)

### VideoTrace

http://www.acvt.com.au/research/videotrace/

(60)

(61)

### References

• Richard Hartley, In Defense of the 8-point Algorithm, ICCV, 1995.

• Carlo Tomasi and Takeo Kanade, Shape and Motion from Image Streams: A Factorization Method, Proceedings of Natl. Acad. Sci., 1993.

• Manolis Lourakis and Antonis Argyros, The Design and

Implementation of a Generic Sparse Bundle Adjustment Software Package Based on the Levenberg-Marquardt Algorithm, FORTH- ICS/TR-320 2004.

• N. Snavely, S. Seitz, R. Szeliski, Photo Tourism: Exploring Photo Collections in 3D, SIGGRAPH 2006.

• A. Hengel et. al., VideoTrace: Rapid Interactive Scene Modelling from Video, SIGGRAPH 2007.

(62)

### Project #3 MatchMove

• It is more about using tools in this project

• You can choose either calibration or structure from motion to achieve the goal

• Calibration

• Voodoo/Icarus

• Examples from previous classes, #1, #2

• Why blending: parallax, lens distortion, scene motion exposure difference. motion,

