## Chapter 4:

## Computer Vision

### Augmented Reality – Principles and Practice

### http://www.augmentedrealitybook.org

### Geometrical Transformation: ^{3D }

### fundamentals

**Why homogeneous coordinate system?**

### Translation

### • P'=T + P ... (I) Scaling

### • P'=S x P ... (II) Rotation

### • P'=R x P ...(III)

### T : translation matrix S : scaling matrix

### R : rotation matrix

### (II) & (III) are multiplications

2

### 3D Rotation & Translation

3

### Rotation axis: X, Y, Z

4

### Geometrical Transformation

### • To be able to compose transformations, each point (x,y,z) is

### represented by (x,y,z,w) where w is usually 1. So, translation becomes

5

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

### 1 1

### 0 0

### 0

### 1 0

### 0

### 0 1

### 0

### 0 0

### 1

### 1 '

### ' '

*z* *y* *x* *dz*

*dy* *dx* *z*

*y* *x*

*dx* *x*

*x* '

### 3D rotation

6

###

###

###

###

###

###

###

###

###

### 1 0

### 0 0

### 0 1

### 0 0

### 0 0

### cos sin

### 0 0

### sin cos

### )

### ( *T* *T*

*T* *T*

*T* *R*

_{z}###

###

###

###

###

###

###

###

###

### 1 0

### 0 0

### 0 cos

### sin 0

### 0 sin

### cos 0

### 0 0

### 0 1

### )

### ( *T* *T*

*T* *T* *T*

*R*

_{X}###

###

###

###

###

###

###

###

###

### 1 0

### 0 0

### 0 cos

### 0 sin

### 0 0

### 1 0

### 0 sin

### 0 cos

### )

### ( *T* *T*

*T* *T*

*T*

*R*

_{Y}### 3D Translation & Scaling

### • Translation

7

### • Scaling

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

### 1 1

### 0 0

### 0

### 1 0

### 0

### 0 1

### 0

### 0 0

### 1 )

### , ,

### ( *z*

*y* *x*

*d* *d* *d* *d*

*d* *d*

*T*

*z*
*y*
*x*

*z*
*y*

*x*

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

### 1 1

### 0 0

### 0

### 0 0

### 0

### 0 0

### 0

### 0 0

### 0 )

### , ,

### ( *z*

*y* *x*

*S* *S*

*S* *S*

*S* *S*

*S*

*z*
*y*

*x*

*z*
*y*

*x*

8

### Scaling

**S = S(s**

_{x}

### , s

_{y}

### , s

_{z}

### ) =

### x’=s

_{x}

### x y’=s

_{y}

### y z’=s

_{z}

### z **p’=Sp**

### Expand or contract along each axis (fixed point of origin)

Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015

###

###

###

###

###

###

###

###

### 1 0

### 0 0

### 0 0

### 0

### 0 0

### 0

### 0 0

### 0

*z*
*y*

*x*

*s* *s*

*s*

9

### Reflection

### corresponds to negative scale factors

### original s

_{x}

### = -1 s

_{y}

### = 1

### s

_{x}

### = -1 s

_{y}

### = -1 s

_{x}

### = 1 s

_{y}

### = -1

Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015

### More on Geometrical Transformations

### • Affine transformation

### • preserving parallelism of lines, but not lengths and angles.

### • Rotation, translation, scaling and shear transformations are affine.

### • Shear ( 剪力 )

10

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

### 1 1

### 1 0

### 0

### 0 1

### 0

### 0 1

*y* *ay* *x*

*y*

*x*

*a*

### 3D shear

11

###

###

###

###

###

###

###

###

###

### 1 0

### 0 0

### 0 1

### 0 0

### 0 1

### 0

### 0 0

### 1 )

### ,

### ( ^{y}

^{y}

*x*

*y* *x*

*xy*

*SH* *SH* *SH*

*SH*

*SH*

12

### Shear

### • Helpful to add one more basic transformation

### • Equivalent to pulling faces in opposite directions

Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015

### Rotation by shearing, a 2D example

13

### Transformation Matrix

### Any number of rotating, scaling, and transition can be multiplied together!

### Post-multiplication vs. pre-multiplication

14

*A* *T*

*z* *y*

*z* *x* *y* *x*

*A* [ 1 ]

### 1

###

###

###

###

###

###

###

###

###

### Rotation representation in Quaternion

www.augmentedrealitybook.org Computer Vision 15

### Point P, after rotation, given by q in Quaternion, is P’

www.augmentedrealitybook.org Computer Vision 16

### Advantages of quaternions

### • The representation of a rotation as a quaternion (4 numbers) is more compact than the representation as an orthogonal matrix (9 numbers). Furthermore, for a given axis and angle, one can easily construct the corresponding quaternion, and conversely, for a given quaternion one can easily read off the axis and the angle.

### Both of these are much harder with matrices or Euler angles.

### • In video games and other applications, one is often interested in “smooth

### rotations”, meaning that the scene should slowly rotate and not in a single step.

### This can be accomplished by choosing a curve such as the spherical linear interpolation in the quaternions, with one endpoint being the identity

### transformation 1 (or some other initial rotation) and the other being the intended final rotation. This is more problematic with other representations of rotations.

www.augmentedrealitybook.org Computer Vision 17

### To solve the problems efficiently:

### use OpenCV

### • OpenCV 的全稱是 Open Source Computer Vision Library ，是一個跨 平台的電腦視覺庫。 OpenCV 是由英特爾公司發起並參與開發，

### 以 BSD 授權條款授權發行，可以在商業和研究領域中免費使

### 用。 OpenCV 可用於開發即時的圖像處理、電腦視覺以及圖型識 別程式。該程式庫也可以使用英特爾公司的 IPP 進行加速處理。

### • OpenCV 可用於解決如下領域的問題：

### •

擴增實境, 臉部辨識### ,

手勢辨識### ,

人機互動, 動作辨識, 運動跟蹤### ,

物體辨識### ,

圖像分割### ,

機器人www.augmentedrealitybook.org Computer Vision 18

### OpenCV

### OpenCV 用 C++ 語言編寫，它的主要介面也是 C++ 語言，但是依然

### 保留了大量的 C 語言介面。該庫也有大量的 Python, Java and

### MATLAB/OCTAVE ( 版本 2.5) 的介面。這些語言的 API 介面函式可以 透過線上文件取得。現在也提供對於 C#, Ch,[5] Ruby 的支援。

### OpenCV 的第二個主要版本是 2009 年 10 月的 OpenCV 2.0

www.augmentedrealitybook.org Computer Vision 19

### A typical example in OpenCV:

### Optimization

### • What is best practice to solve least square problem Ax = b?

### where A is a matrix and b, x is a vector

### It often happens that Ax = b has no solution. The usual reason is: too many equations. The matrix has more rows than columns.

### To repeat: We cannot always get the error e = b - Ax down to zero. When e is zero, x is an exact solution to Ax = b. When the length of e is as small as possible, x’

### is a least squares solution. Our goal in this section is to compute x’ and use it. These are real problems and they need an answer.

### If Ax is linear, A

^{T }

### Ax = A

^{T }

### b, Therefore, x = A

^{T }

### b/(A

^{T }

### A), if Ax is non-linear, use Iterative method

### in OpenCV, use the function solve(A, b, x, DECOMP_SVD);

www.augmentedrealitybook.org Computer Vision 20

solve(A, b, x, DECOMP_SVD);

### First Application: Marker Tracking

### 1. Capturing image with known camera 2. Search for quadrilaterals

### 3. Pose estimation from homography 4. Pose refinement

### Minimize nonlinear projection error

### 5. Use final pose

www.augmentedrealitybook.org Computer Vision 21

1

3 2

5 4

Image: Daniel Wagner

### Pinhole Camera

**• Project 3D point q to 2D point p**

**• Center of projection c**

### • Image plane ∏

**• Principal point c’**

*• Focal length f*

**• p = M q = K [R|t] q**

**• 5DOF for K, 3DOF for R, 3DOF for t**

**• Assume we known K (for now)**

Optical axis

Center of projection

**c**

Image plane

π
**q**

**p**

u v

z y

x

f
**c'**

www.augmentedrealitybook.org Computer Vision 22

### Perspective Projection

### (Pinhole Camera, eye at origin )

**y**
**x**

**z**

**z**

**P(x, y, z)**
**P(x, y, z)**

**d**
**d**

**y**_{p}**x**_{p}

**View along x axis**
**View along y axis**

**Projection**
**plane**

**Projection**

**plane**

###

###

###

###

###

###

###

###

###

###

###

###

### 0 /

### 1 0

### 0

### 0 1

### 0 0

### 0 0

### 1 0

### 0 0

### 0 1

### ; / /

### ;

*d* *M*

*d* *z* *y* *y*

*d* *z* *x* *x*

*z* *y* *d*

*y* *z* *x* *d*

*x*

*per*

*p*
*p*

*p*
*p*

### Perspective Division

*However W 1, so we must divide by W to return from * homogeneous coordinates

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

*d* *z* *z* *y* *x*

*z* *y* *x*

*d* *P*

*M* *W*

*Z* *Y* *X*

*z* *y* *x*

*per*
*p*

*p*
*p*

### 1 0

### / 1 0

### 0

### 0 1

### 0 0

### 0 0

### 1 0

### 0 0

### 0 1

### 1

###

###

###

###

###

###

###

###

### , , 1

### , / 1 /

### , ,

### , 1

### , ,

### , *d*

*d* *z*

*y* *d*

*z* *x* *W*

*Z* *W*

*Y* *W*

*z* *X* *y*

*x*

_{p}

_{p}

_{p}### p _{4} p _{3}

### p _{2} p _{1}

### m s _{4} s _{3}

### s _{1} s _{2}

### Marker Detection

### • Grayscale image

### • Adaptive image threshold with

### gradient of logarithm of intensities

### • Cheaper threshold: compute locally and interpolate

www.augmentedrealitybook.org Computer Vision 25

### Quad Finding

*• Find edges (black pixel after white) on every n-th line*

### • Follow edge in 4-connected neighborhood

### • Until loop closed or hitting border

**• Start at a and walk contour, search p**

_{1}

### at maximum distance

**• Compute centroid m**

**• Find corners p**

_{2}

**, p**

_{3}

** on either side of d**

_{1,m}

**=(p**

_{2}

**,p**

_{3}

### )

**• Find farthest point p**

_{4}

**• Determine orientation from black corner at s**

www.augmentedrealitybook.org Computer Vision _{i}

**=(p**

_{i}

**+m)/2**

26
a

p_{1} p_{1}

m max

p_{2}

p_{3}

max

d_{1,m}

p_{1}
p_{2}

p_{3}
d_{2,3}

p_{4}

max max

### Pose Estimation from Homography ^{(} ^{單}

### 應性 , 射影變換 )

### • Marker corners lie in plane

**∏’: q**

_{z}

### =0

**• Express 3D point qϵ∏’ as **

**homogeneous point q’=[q**

_{x}

### ,q

_{y}

### ,1]

### • Mapping = homography **p = H q’**

**• Estimate H using direct** linear transformation

**• Recover pose R,t from H=K[R**

_{C1}

**|R**

_{C2}

**|t]**

www.augmentedrealitybook.org Computer Vision 27

**p**
**q**

π' π

v y x

u

### Homography Decomposition

h_{1}

h_{2}
h_{1,2}

h_{2,1}
R_{C3}

R_{C1}
R_{C2}

To compute a pose from a homography, the rotation components of the

homography need to be ortho-normalized first.

See the next slide.

www.augmentedrealitybook.org Computer Vision 28

### 3D rotation (Note: the rotation components of the matrix need to be ortho-normal)

29

###

###

###

###

###

###

###

###

###

### 1 0

### 0 0

### 0 1

### 0 0

### 0 0

### cos sin

### 0 0

### sin cos

### )

### ( *T* *T*

*T* *T*

*T* *R*

_{z}###

###

###

###

###

###

###

###

###

### 1 0

### 0 0

### 0 cos

### sin 0

### 0 sin

### cos 0

### 0 0

### 0 1

### )

### ( *T* *T*

*T* *T* *T*

*R*

_{X}###

###

###

###

###

###

###

###

###

### 1 0

### 0 0

### 0 cos

### 0 sin

### 0 0

### 1 0

### 0 sin

### 0 cos

### )

### ( *T* *T*

*T* *T*

*T*

*R*

_{Y}### Pose refinement

### • Iteratively minimize reprojection error

### • Displacement of known 3D points qi projected by [R|t] from known image locations pi

### • Minimize using Gauss-Newton or Levenberg-Marquardt

www.augmentedrealitybook.org Computer Vision 30

### Multiple-Camera Infrared Tracking

### 1. Blob detection in all images (centroid of connected regions) 2. Establish point correspondences between blobs

*using epipolar geometry*

*3. Triangulation of 3D points from multiple 2D points * ^{(} ^{三角定位 )} *4. Matching of 3D candidate points to target points*

*5. Compute target pose (absolute orientation)*

www.augmentedrealitybook.org Computer Vision 31

### Epipolar line ^{(} ^{核線 )} (For example, Line e _{R} ^{X} _{R} )

www.augmentedrealitybook.org Computer Vision 32

### Cross product: in matrix representation

www.augmentedrealitybook.org Computer Vision 33

### Matrix representation of a

_{x}

### • Therefore, a cross-product b =

www.augmentedrealitybook.org Computer Vision 34

epipolar line

### Epipolar Geometry

### e

### l

**c** _{1} **c** _{2}

**p** _{1} **p** _{2}

**e** _{1} **e** _{2}

### π _{1} π _{2}

**q**

www.augmentedrealitybook.org Computer Vision 35

### Known:

** K**

_{1}

**, K**

_{2}

**, [R|t] (1**

^{st}

### 2

^{nd}

### camera) **Search for p**

_{2}

** on epipolar line l**

** p**

_{2}

** = K**

_{2}

^{-T }

**• t**

_{x }

**• R • K**

_{1}

^{-1 }

**• p**

_{1}

P2

### = F

P1, F: fundamental Matrix, and**F = K**

_{2}

^{-T }

**• t**

_{x }

**• R • K**

_{1}

^{-1 }

相機 C1, 經過旋轉 , 平移 , 會到 相機 C2, 求旋轉 , 平移的量值

Fig. 4.3

### Multiple View Geometry: Essential Matrix and Fundamental Matrix

www.augmentedrealitybook.org Computer Vision 36

### Essential Matrix and Fundamental Matrix: II, where A1, A2 = K1, K2 and m1, m2 = P1, P2 in Fig. 4.3

www.augmentedrealitybook.org Computer Vision 37

### Epipolar geometry: Essential matrix

www.augmentedrealitybook.org Computer Vision 38

### Epipolar geometry: Essential matrix II

www.augmentedrealitybook.org Computer Vision 39

### Triangulation from Two Cameras

### • Just two cameras

**• Two rays starting at c**

_{1}

**, c**

_{2}

** in directions of points p**

_{1}

**, p**

_{2}

**• Find point q closest to the two rays**

### • More than two cameras

**• Build equation system from relationships p=M q **

### • Each camera yields two equations

### • Solve using direct linear transform

www.augmentedrealitybook.org Computer Vision 40

**c**_{1}

**c**_{2}
**p**_{1}

**p**_{2}
**q**

**d**_{2}
**d**_{1}

### Solution

### If P = M q

### P cross-product M q = 0, solve for M.

www.augmentedrealitybook.org Computer Vision 41

### Target Matching

**• Candidate points q**

_{i}

**, target points r**

_{i}

*, i ≤ j*

### • Geometric signature of target

### • Distances and angles between pairs of points

### • Compare to candidate’s signature

### • Among good enough matches, select the best one

www.augmentedrealitybook.org Computer Vision 42

### Absolute Orientation

**• Which R, t aligns the candidate points with the target points?**

**• Determine t from q**

i** to r**

_{i}

**• Difference of centroids of q**

i** and r**

i
**• Determine R from q**

_{i}

** to r**

_{i}

### • Define intermediate coordinate system

**• Origin at q**1

**• X-axis x**q** aligned with q**2**-q**1

**• Y-axis y**q** orthogonal to x**_{q}**, lies in plane (q**_{1}**,q**_{2}**,q**_{3})

**• Z-axis z**q** is cross-product of x**q** and y**q

**• [x**

q**|y**

_{q}

**|z**

_{q}

### ] is a rotation to the intermediate coordinate system

### • Do the same from the ri side and concatenate the rotations

www.augmentedrealitybook.org Computer Vision 43

### FAST

1 2 3 45 6 7 9 8

10 11 1312 14

15 16

FAST searches for a contiguous sequence of pixels on a circle, which are consistently lighter or darker than the center.

Early exit can be found by first testing the pixels at the top, bottom, left, and right (right image).

Often, an improved detection method based on machine learning and a precompiled decision tree algorithm is used, allowing

better generalization for arc lengths smaller than 12 pixels.

www.augmentedrealitybook.org Computer Vision 44

Image: Gerhard Reitmayr

### SIFT

0...2π 0...2π 0...2π 0...2π SIFT determines gradient

vectors for every pixel of the 8 × 8 image patch

A 2 × 2 descriptor array with 8-bin histograms relating cumulative gradient vector magnitude to gradient orientation is built.

www.augmentedrealitybook.org Computer Vision 45

46

### Overview of SIFT Descriptor

### • Keypoint localization

### • Searching for minima and maxima in scale space (also provides scale estimation for scale invariance)

### • Requires building pyramid of Difference of Gaussian

### • Major performance bottleneck of SIFT

### • Feature description

### • Estimation of dominant 2D feature orientations

### • Orientation histogram of 4x4 sub-regions (128 bins)

### • Feature matching

### • Feature database stored as k-d tree for sub-linear search time

www.augmentedrealitybook.org Computer Vision

47

### Overall Rotation Check

### • SIFT provides keypoint rotation for free

### • All keypoints must have same relative rotation (90 degrees clockwise in this case)

### • Look at histogram and keep

### only peaks (majority by voting algorithm)

**0** **2π**

www.augmentedrealitybook.org Computer Vision

48

### Line Test

### • Pick two correspondences Ú define line

### • All other must be on the same side (red line indicates not on the same side)

www.augmentedrealitybook.org Computer Vision

### Three-Point Pose (P3P)

**q** _{i}

**q** _{j} d _{ij}

**p** _{j} d _{i}

**p** _{i}

**c**

### Φ _{i,j}

### d _{j}

**P3P computes the distance d**_{i} from
**the camera center c to a 3D point q**_{i}

www.augmentedrealitybook.org Computer Vision 49

### Tukey Estimator

www.augmentedrealitybook.org Computer Vision 50

51

### Tracking by Detection

### • This is what most

### „trackers“ do…

### • Targets are detected every frame

### • Popular because detection and pose estimation

### are solved simultaneously

www.augmentedrealitybook.org Computer Vision

### Detection and Tracking

Detection Incremental

tracking Tracking target

detected

Tracking target lost

Tracking target not detected

Incremental tracking ok Start

+ Recognize target type + Detect target

+ Initialize camera pose

+ Fast

+ Robust to blur, lighting changes + Robust to tilt

Tracking and detection are complementary approaches.

After successful detection, the target is tracked incrementally.

If the target is lost, the detection is activated again

www.augmentedrealitybook.org Computer Vision 52

### Motion Model

**x**_{t}
**x**_{t-1}

**X**_{t+1}
**t=(x**_{t}**-x**_{t-1})

**t**
3D

motion model

**q**
search
window
**p**_{t-1} **t=p**_{t}**-p**_{t-1}

**p**_{t}

2D motion model

search window image space

Active search in 2D Active search in 3D

www.augmentedrealitybook.org Computer Vision 53

54

### Patch Tracking Idea

### • Target Detection:

### • find feature in reference image

### • Target Tracking:

### • Take previous pose and apply motion model

### • Get estimate for what we are looking for

### • Create affine warped patches of reference features

### • Closely resemble how the feature should look in the camera image

### • Project patches into camera image and use normalized cross correlation (NCC) to match

www.augmentedrealitybook.org Computer Vision

### Patch Tracking

a patch taken from the template image

affinely warped using the estimated camera pose from a motion model

warped patch is compared to the current camera image

www.augmentedrealitybook.org Computer Vision 55

### Hierarchical Feature Matching

only a small number of interest points used to obtain a first estimate of the camera pose

the full set of interest points is considered at full resolution, but with a much smaller search window

www.augmentedrealitybook.org Computer Vision 56

57

### PatchTracker Workflow Analysis

### • Affine warped patches allow very strong affine transformations (tilt close to 90°)

### • NCC allows severe lighting changes

### • 5 pixel search radius at half-res allows “wide” base-line

### • 5 x 2 x 20Hz = 200 pixels/sec

### • Tracking 100 features at full-res strongly reduces jitter

www.augmentedrealitybook.org Computer Vision

58

### When Does It Break? (1/2)

www.augmentedrealitybook.org Computer Vision

59

### When Does It Break? (2/2)

www.augmentedrealitybook.org Computer Vision

60

### Orthogonal Strengths and Weaknesses

### SIFT/Ferns PatchTracker

### Recognize many targets

### Detect target

### Initialize tracking

### Speed

### Robust to blur

### Robust to tilt

### Robust to lighting changes

www.augmentedrealitybook.org Computer Vision

### Windowed Bundle Adjustment

www.augmentedrealitybook.org Computer Vision 61

Windowed bundle adjustment limits the computational effort by only optimizing over neighboring camera poses

### Visual Tracking Approaches

### • Marker based tracking with artificial features

### • Make a model before tracking

### • Model based tracking with natural features

### • Acquire a model before tracking

**• Simultaneous localization and mapping (SLAM)**

*• Build a model while tracking it*

### • Example 1: Parallel Tracking and Mapping with Monocular Camera

### • Example 2: KinectFusion

www.augmentedrealitybook.org Computer Vision 62

### Parallel Tracking and Mapping

Tracking Mapping

New keyframes

Map updates

+ Estimate camera pose + For every frame

+ Extend map + Improve map + Slow updates rate

www.augmentedrealitybook.org Computer Vision 63

Parallel tracking and mapping uses two concurrent threads, one for tracking and one for mapping, which run at different speeds

### Parallel Tracking and Mapping

Video stream

New frames

Map updates

Tracking Mapping

Tracked local pose

FAST SLOW

Simultaneous

localization and mapping (SLAM)

in small workspaces

Klein/Drummond, U. Cambridge

www.augmentedrealitybook.org Computer Vision 64

### Keyframe SLAM

### • Standard SLAM: Repeat until tracking is lost

### • Extract features from live image (or track features in image)

### • Match features to existing map

### • Determine camera pose from matched 3D points

### • Try to triangulate new features to get new 3D points

### • Insert any new 3D points into map (or update existing map points)

### • Keyframe SLAM

### • Build map only from selected keyframes

### • Split tracking and mapping into two threads

### • Tracking at framerate, mapping at slower rate

www.augmentedrealitybook.org Computer Vision 65

### Multi-Threaded SLAM

**Tracking**

### • Estimate camera pose

### • Must run at 30Hz

### • As robust and accurate as possible

### • Track/render loop with two tracking stages

### • Coarse stage with ~50 big features

### • Fine state with ~1000 random features

**Mapping**

### • Get new keyframe

### • Add new map points

### • Optimize map

### • Map Maintenance

www.augmentedrealitybook.org Computer Vision 66

### Keyframes

### • Keyframes are only added if

### • Baseline to other keyframes large enough

### • Tracking quality is good

### • When a keyframe is added

### • The mapping thread stops current maintenance work

### • All points in the map are measured in the keyframe

### • New map points are found and added to the map

www.augmentedrealitybook.org Computer Vision 67

### New Map Points

### • Find as many map points as possible

**• Check all maximal FAST corners in the keyframe**

### • Check Shi-Tomasi score

### • Check if already in map

### • Epipolar search in a neighboring keyframe

### • Triangulate matches and add to map

### • Repeat in four image pyramid levels

www.augmentedrealitybook.org Computer Vision 68

### Map Optimization and Maintenance

### • Uses bundle adjustment

### • Adjusts map point positions + keyframe poses

### • Minimize reprojection error of all points in all keyframes (or use only *last N keyframes)*

### • Improve map in idle time (camera not exploring)

### • Try to measure new map features in old keyframes

www.augmentedrealitybook.org Computer Vision 69

### Small Blurry Images

www.augmentedrealitybook.org Computer Vision 70

Computed by resampling 640x480 to 40x30 Blurred with a Gaussian kernel of size 5 pixels Use to re-detect pose after tracking failure

### Tracking and Mapping on Mobiles

**Panoramic SLAM**

### • Only rotation, user must stay in one place

### • Works instantly

**Full 6DOF SLAM**

### • User can move freely

### • Needs baseline – walk several meters

Image: Christian Pirchheim

www.augmentedrealitybook.org Computer Vision 71

### Hybrid SLAM

www.augmentedrealitybook.org Computer Vision 72

A SLAM system that can handle both general 6DOF motion and pure rotation has the advantage that the user is not constrained to a certain type of motion. It also presents the opportunity to recover 3D features (magenta, 洋紅色 ) from panoramic features (cyan, 青色 )

when additional views become available.

C: deep greenish-blue color Magenta: a reddish-purple color

Image: Christian Pirchheim

### Hybrid SLAM Results

www.augmentedrealitybook.org Computer Vision 73

The combination of 6DOF and panoramic SLAM delivers much more robust tracking performance during arbitrary user motion. (top) Conventional 6DOF SLAM can track the pose for only 53% of the frames. (bottom) Combined SLAM can track the pose in 98% of the frames.

Image: Christian Pirchheim

### Kinect Fusion Overview

### • Simultaneous Localization And Mapping with depth sensor

### • For 3D object scanning and model creation

### • User moves depth sensor through scene

### • Position and orientation are tracked by ICP

### • Depth maps are combined to create a model

### • Model is used for future tracking

Raw depth image Volumetric raycasting

TSDF Volume ICP Camera tracking

Volumetric integration

1

2 3

2 3

4

www.augmentedrealitybook.org Computer Vision 74

### KinectFusion Output

### • Tracking data

### • Transformation matrix from the current to the first position + orientation

### • Model

### • Volumetric description of the scene

### • Can be converted to a polygon mesh or rendered directly using raycasting

### • Less noise and more details than the collected depth samples

### • No holes

www.augmentedrealitybook.org Computer Vision 75

### Kinect Fusion

### •

KinectFusion: Real-Time Dense Surface Mapping**• 2011 10th IEEE International Symposium on Mixed and Augmented ** Reality,

www.augmentedrealitybook.org Computer Vision 76

### KinectFusion Tracking

### • Convert input depth map to point cloud (color ignored)

### • Compute normals for each point

### • Find transformation modelcurrent depth map

### • Interative Closest Points (ICP)

### • ICP can only track small requires slow movements

### • Outliers as by-product

### • Observed points that do not fit model

### • Can be used to detect moving objects

www.augmentedrealitybook.org Computer Vision 77

### Tracking with Iterative Closest Points

### Iterative Closest Point algorithm

### 1. Project current depth map using last known position 2. Find closest point pairs between depth map and model 3. If distances < threshold done

### 4. From list of correspondences, compute pose

### • Solve linear equation system in a least squares sense

### 5. Use pose to reproject depth map, goto 2

www.augmentedrealitybook.org* Computer Vision 78

### Truncated Signed Distance Function (TSDF)

-1.0 -1.0 -1.0 -1.0

-0.5 0.1 -1.0 -0.8

0.8 1.0 0.2 1.0

1.0 1.0 1.0 1.0 -1.0 -1.0

-1.0 -1.0

-1.0 -0.9 -1.0 -0.9

0.1 1.0 0.1 0.9

1.0 1.0 1.0 1.0 -1.0 -1.0

-1.0 -1.0

-1.0 -1.0 -1.0 -0.9

-0.8 0.3 -0.2 0.8

0.9 1.0 1.0 1.0 -1.0 -1.0

-1.0 -0.9

-1.0 -0.7 -0.1 0.1

-0.1 1.0 0.3 1.0

1.0 1.0 1.0 1.0

www.augmentedrealitybook.org Computer Vision 79

For every point in space, store the distance to the closest isosurface

### Volumetric Integration

### • Mapping step of SLAM

### • Map is a voxel grid (3D array) of scalar values

### • Current depth map is transformed using tracked pose

### • Values close to the depth map are updated

### • Effect of integrating based on TSDF

### • Noise is smoothed by averaging

### • Updating weight can be used to allow for changes in the scene

### • Limited range of influence (number of updated voxels limited)

www.augmentedrealitybook.org Computer Vision 80

### KinectFusion Rendering

### • Usually done by raycasting: viewing rays are searched for 0

### • Surface is the 0-level-set

### • Can also be exported as a mesh

### • Microsoft Kinect SDK applies Marching Cubes

### • Texturing requires a set of color images + tranformations

### • Texture coordinates can be found by projecting the vertices or voxels

www.augmentedrealitybook.org Computer Vision 81

### Sparse Point Cloud

www.augmentedrealitybook.org Computer Vision 82

The main square of Graz, Austria, represented as a 3D point cloud computed using SFM from a large set of panoramic images.

Courtesy of Clemens Arth

### View Cells

www.augmentedrealitybook.org Computer Vision 83

After reconstruction, the relevant parts of an urban area can be subdivided into cells. By preselecting cells based on a GPS measurement as a source, one can substantially prune the relevant portions of the reconstruction database.

Image: Clemens Arth

### Prune Search Space with Sensor Priors

### • GPS: only search near position prior

### • Compass: only search in approximate heading

### • Accelerometer/Gravity:

### Only consider features with right orientation

GPS

Gravity

Compass

www.augmentedrealitybook.org Computer Vision 84

### • 15% higher success rate of localization

### • Much faster

### Improvement with Sensor Priors

www.augmentedrealitybook.org Computer Vision 85

### Prune Search Space with GIS

### • OpenStreetMap now available everywhere!

### • Offline:

### • Align features with facades during

### reconstruction

### • Reconstructed model is less distorted

### • Online:

### • Compute visibility sets

### • Visibility from facades (GPU)

### • Prune database using visibility

### • Detecting building outlines in image [in progress]

www.augmentedrealitybook.org Computer Vision 86

### Sensor Priors

www.augmentedrealitybook.org Computer Vision 87

A magnetometer (compass) can be used as a source of prior information to narrow the search for point correspondences to those with a normal facing approximately toward the user.

Image: Clemens Arth

### Gravity Aligned Features

www.augmentedrealitybook.org Computer Vision 88

Features with an orientation aligned to gravity rather than to a visual attribute such as the gradient can be matched more reliably.

Image: Clemens Arth

### Visibility

www.augmentedrealitybook.org Computer Vision 89

The potentially visible set for the central square contains the street segments immediately connected to the square (blue arrows), but not the street segments after one or more turns (dashed red lines).

### Parallel Tracking, Mapping and Localization

Video stream

New frames

Map updates

Tracking Mapping

Tracked global pose

Wide-area visual feature database

Global pose Matching New keyframes

**CLIENT** **SERVER**

FAST SLOW SLOWEST

www.augmentedrealitybook.org Computer Vision 90

Conventional SLAM (blue) performs tracking and mapping simultaneously on a mobile client device. By adding a localization server (green), a third concurrent activity is added: matching to a global database of visual features for wide-

area localization. Client and server operate independently, so the client can always run at the highest frame rate.

### Tracking in Panoramas

www.augmentedrealitybook.org Computer Vision 91

Image: Clemens Arth

With panoramic SLAM, the user may perform only rotational motion, such as when exploring the immediate environment.

Courtesy of Daniel Wagner

### Panorama Matching

www.augmentedrealitybook.org Computer Vision 92

The yellow lines show the feature matches obtained from a panoramic image. Note how certain direction, where facades are directly observed, perform very well, while directions facing down a street perform poorly. This illustrates why a wide field of view is needed for reliable outdoor localization.

Image: Clemens Arth

### Outdoor Localization Result

www.augmentedrealitybook.org Computer Vision 93

Multiple images from a sequence tracked with 6DOF SLAM on a client, while a localization server provides the global pose used to overlay the building outlines with transparent yellow structures.

Image: Jonathan Ventura and Clemens Arth

### Outdoor SLAM Result

www.augmentedrealitybook.org Computer Vision 94

This SLAM sequence starts with tracking a facade (overlaid in yellow), for which a global pose has been determined by a server. The images in the bottom row cannot continue tracking with information known to the server;

the poster in the foreground, which has been incorporated into the SLAM map, is used for tracking instead.

Image: Jonathan Ventura and Clemens Arth