Chapter 4:
Computer Vision
Augmented Reality – Principles and Practice
http://www.augmentedrealitybook.org
Geometrical Transformation: 3D
fundamentals
Why homogeneous coordinate system?
Translation
• P'=T + P ... (I) Scaling
• P'=S x P ... (II) Rotation
• P'=R x P ...(III)
T : translation matrix S : scaling matrix
R : rotation matrix
(II) & (III) are multiplications
2
3D Rotation & Translation
3
Rotation axis: X, Y, Z
4
Geometrical Transformation
• To be able to compose transformations, each point (x,y,z) is
represented by (x,y,z,w) where w is usually 1. So, translation becomes
5
1 1
0 0
0
1 0
0
0 1
0
0 0
1
1 '
' '
z y x dz
dy dx z
y x
dx x
x '
3D rotation
6
1 0
0 0
0 1
0 0
0 0
cos sin
0 0
sin cos
)
( T T
T T
T R
z
1 0
0 0
0 cos
sin 0
0 sin
cos 0
0 0
0 1
)
( T T
T T T
R
X
1 0
0 0
0 cos
0 sin
0 0
1 0
0 sin
0 cos
)
( T T
T T
T
R
Y3D Translation & Scaling
• Translation
7
• Scaling
1 1
0 0
0
1 0
0
0 1
0
0 0
1 )
, ,
( z
y x
d d d d
d d
T
z y x
z y
x
1 1
0 0
0
0 0
0
0 0
0
0 0
0 )
, ,
( z
y x
S S
S S
S S
S
z y
x
z y
x
8
Scaling
S = S(s
x, s
y, s
z) =
x’=s
xx y’=s
yy z’=s
zz p’=Sp
Expand or contract along each axis (fixed point of origin)
Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015
1 0
0 0
0 0
0
0 0
0
0 0
0
z y
x
s s
s
9
Reflection
corresponds to negative scale factors
original s
x= -1 s
y= 1
s
x= -1 s
y= -1 s
x= 1 s
y= -1
Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015
More on Geometrical Transformations
• Affine transformation
• preserving parallelism of lines, but not lengths and angles.
• Rotation, translation, scaling and shear transformations are affine.
• Shear ( 剪力 )
10
1 1
1 0
0
0 1
0
0 1
y ay x
y
x
a
3D shear
11
1 0
0 0
0 1
0 0
0 1
0
0 0
1 )
,
( y
x
y x
xy
SH SH SH
SH
SH
12
Shear
• Helpful to add one more basic transformation
• Equivalent to pulling faces in opposite directions
Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015
Rotation by shearing, a 2D example
13
Transformation Matrix
Any number of rotating, scaling, and transition can be multiplied together!
Post-multiplication vs. pre-multiplication
14
A T
z y
z x y x
A [ 1 ]
1
Rotation representation in Quaternion
www.augmentedrealitybook.org Computer Vision 15
Point P, after rotation, given by q in Quaternion, is P’
www.augmentedrealitybook.org Computer Vision 16
Advantages of quaternions
• The representation of a rotation as a quaternion (4 numbers) is more compact than the representation as an orthogonal matrix (9 numbers). Furthermore, for a given axis and angle, one can easily construct the corresponding quaternion, and conversely, for a given quaternion one can easily read off the axis and the angle.
Both of these are much harder with matrices or Euler angles.
• In video games and other applications, one is often interested in “smooth
rotations”, meaning that the scene should slowly rotate and not in a single step.
This can be accomplished by choosing a curve such as the spherical linear interpolation in the quaternions, with one endpoint being the identity
transformation 1 (or some other initial rotation) and the other being the intended final rotation. This is more problematic with other representations of rotations.
www.augmentedrealitybook.org Computer Vision 17
To solve the problems efficiently:
use OpenCV
• OpenCV 的全稱是 Open Source Computer Vision Library ,是一個跨 平台的電腦視覺庫。 OpenCV 是由英特爾公司發起並參與開發,
以 BSD 授權條款授權發行,可以在商業和研究領域中免費使
用。 OpenCV 可用於開發即時的圖像處理、電腦視覺以及圖型識 別程式。該程式庫也可以使用英特爾公司的 IPP 進行加速處理。
• OpenCV 可用於解決如下領域的問題:
•
擴增實境, 臉部辨識,
手勢辨識,
人機互動, 動作辨識, 運動跟蹤,
物體辨識,
圖像分割,
機器人www.augmentedrealitybook.org Computer Vision 18
OpenCV
OpenCV 用 C++ 語言編寫,它的主要介面也是 C++ 語言,但是依然
保留了大量的 C 語言介面。該庫也有大量的 Python, Java and
MATLAB/OCTAVE ( 版本 2.5) 的介面。這些語言的 API 介面函式可以 透過線上文件取得。現在也提供對於 C#, Ch,[5] Ruby 的支援。
OpenCV 的第二個主要版本是 2009 年 10 月的 OpenCV 2.0
www.augmentedrealitybook.org Computer Vision 19
A typical example in OpenCV:
Optimization
• What is best practice to solve least square problem Ax = b?
where A is a matrix and b, x is a vector
It often happens that Ax = b has no solution. The usual reason is: too many equations. The matrix has more rows than columns.
To repeat: We cannot always get the error e = b - Ax down to zero. When e is zero, x is an exact solution to Ax = b. When the length of e is as small as possible, x’
is a least squares solution. Our goal in this section is to compute x’ and use it. These are real problems and they need an answer.
If Ax is linear, A
TAx = A
Tb, Therefore, x = A
Tb/(A
TA), if Ax is non-linear, use Iterative method
in OpenCV, use the function solve(A, b, x, DECOMP_SVD);
www.augmentedrealitybook.org Computer Vision 20
solve(A, b, x, DECOMP_SVD);
First Application: Marker Tracking
1. Capturing image with known camera 2. Search for quadrilaterals
3. Pose estimation from homography 4. Pose refinement
Minimize nonlinear projection error
5. Use final pose
www.augmentedrealitybook.org Computer Vision 21
1
3 2
5 4
Image: Daniel Wagner
Pinhole Camera
• Project 3D point q to 2D point p
• Center of projection c
• Image plane ∏
• Principal point c’
• Focal length f
• p = M q = K [R|t] q
• 5DOF for K, 3DOF for R, 3DOF for t
• Assume we known K (for now)
Optical axis
Center of projection
c
Image plane
π q
p
u v
z y
x
f c'
www.augmentedrealitybook.org Computer Vision 22
Perspective Projection
(Pinhole Camera, eye at origin )
y x
z
z
P(x, y, z) P(x, y, z)
d d
yp xp
View along x axis View along y axis
Projection plane
Projection
plane
0 /
1 0
0
0 1
0 0
0 0
1 0
0 0
0 1
; / /
;
d M
d z y y
d z x x
z y d
y z x d
x
per
p p
p p
Perspective Division
However W 1, so we must divide by W to return from homogeneous coordinates
d z z y x
z y x
d P
M W
Z Y X
z y x
per p
p p
1 0
/ 1 0
0
0 1
0 0
0 0
1 0
0 0
0 1
1
, , 1
, / 1 /
, ,
, 1
, ,
, d
d z
y d
z x W
Z W
Y W
z X y
x
p p pp 4 p 3
p 2 p 1
m s 4 s 3
s 1 s 2
Marker Detection
• Grayscale image
• Adaptive image threshold with
gradient of logarithm of intensities
• Cheaper threshold: compute locally and interpolate
www.augmentedrealitybook.org Computer Vision 25
Quad Finding
• Find edges (black pixel after white) on every n-th line
• Follow edge in 4-connected neighborhood
• Until loop closed or hitting border
• Start at a and walk contour, search p
1at maximum distance
• Compute centroid m
• Find corners p
2, p
3on either side of d
1,m=(p
2,p
3)
• Find farthest point p
4• Determine orientation from black corner at s
www.augmentedrealitybook.org Computer Vision i=(p
i+m)/2
26a
p1 p1
m max
p2
p3
max
d1,m
p1 p2
p3 d2,3
p4
max max
Pose Estimation from Homography ( 單
應性 , 射影變換 )
• Marker corners lie in plane
∏’: q
z=0
• Express 3D point qϵ∏’ as
homogeneous point q’=[q
x,q
y,1]
• Mapping = homography p = H q’
• Estimate H using direct linear transformation
• Recover pose R,t from H=K[R
C1|R
C2|t]
www.augmentedrealitybook.org Computer Vision 27
p q
π' π
v y x
u
Homography Decomposition
h1
h2 h1,2
h2,1 RC3
RC1 RC2
To compute a pose from a homography, the rotation components of the
homography need to be ortho-normalized first.
See the next slide.
www.augmentedrealitybook.org Computer Vision 28
3D rotation (Note: the rotation components of the matrix need to be ortho-normal)
29
1 0
0 0
0 1
0 0
0 0
cos sin
0 0
sin cos
)
( T T
T T
T R
z
1 0
0 0
0 cos
sin 0
0 sin
cos 0
0 0
0 1
)
( T T
T T T
R
X
1 0
0 0
0 cos
0 sin
0 0
1 0
0 sin
0 cos
)
( T T
T T
T
R
YPose refinement
• Iteratively minimize reprojection error
• Displacement of known 3D points qi projected by [R|t] from known image locations pi
• Minimize using Gauss-Newton or Levenberg-Marquardt
www.augmentedrealitybook.org Computer Vision 30
Multiple-Camera Infrared Tracking
1. Blob detection in all images (centroid of connected regions) 2. Establish point correspondences between blobs
using epipolar geometry
3. Triangulation of 3D points from multiple 2D points ( 三角定位 ) 4. Matching of 3D candidate points to target points
5. Compute target pose (absolute orientation)
www.augmentedrealitybook.org Computer Vision 31
Epipolar line ( 核線 ) (For example, Line e R X R )
www.augmentedrealitybook.org Computer Vision 32
Cross product: in matrix representation
www.augmentedrealitybook.org Computer Vision 33
Matrix representation of a
x• Therefore, a cross-product b =
www.augmentedrealitybook.org Computer Vision 34
epipolar line
Epipolar Geometry
e
l
c 1 c 2
p 1 p 2
e 1 e 2
π 1 π 2
q
www.augmentedrealitybook.org Computer Vision 35
Known:
K
1, K
2, [R|t] (1
st2
ndcamera) Search for p
2on epipolar line l
p
2= K
2-T• t
x• R • K
1-1• p
1P2
= F
P1, F: fundamental Matrix, and F = K2-T • tx • R • K1-1相機 C1, 經過旋轉 , 平移 , 會到 相機 C2, 求旋轉 , 平移的量值
Fig. 4.3
Multiple View Geometry: Essential Matrix and Fundamental Matrix
www.augmentedrealitybook.org Computer Vision 36
Essential Matrix and Fundamental Matrix: II, where A1, A2 = K1, K2 and m1, m2 = P1, P2 in Fig. 4.3
www.augmentedrealitybook.org Computer Vision 37
Epipolar geometry: Essential matrix
www.augmentedrealitybook.org Computer Vision 38
Epipolar geometry: Essential matrix II
www.augmentedrealitybook.org Computer Vision 39
Triangulation from Two Cameras
• Just two cameras
• Two rays starting at c
1, c
2in directions of points p
1, p
2• Find point q closest to the two rays
• More than two cameras
• Build equation system from relationships p=M q
• Each camera yields two equations
• Solve using direct linear transform
www.augmentedrealitybook.org Computer Vision 40
c1
c2 p1
p2 q
d2 d1
Solution
If P = M q
P cross-product M q = 0, solve for M.
www.augmentedrealitybook.org Computer Vision 41
Target Matching
• Candidate points q
i, target points r
i, i ≤ j
• Geometric signature of target
• Distances and angles between pairs of points
• Compare to candidate’s signature
• Among good enough matches, select the best one
www.augmentedrealitybook.org Computer Vision 42
Absolute Orientation
• Which R, t aligns the candidate points with the target points?
• Determine t from q
ito r
i• Difference of centroids of q
iand r
i• Determine R from q
ito r
i• Define intermediate coordinate system
• Origin at q1
• X-axis xq aligned with q2-q1
• Y-axis yq orthogonal to xq, lies in plane (q1,q2,q3)
• Z-axis zq is cross-product of xq and yq
• [x
q|y
q|z
q] is a rotation to the intermediate coordinate system
• Do the same from the ri side and concatenate the rotations
www.augmentedrealitybook.org Computer Vision 43
FAST
1 2 3 45 6 7 9 8
10 11 1312 14
15 16
FAST searches for a contiguous sequence of pixels on a circle, which are consistently lighter or darker than the center.
Early exit can be found by first testing the pixels at the top, bottom, left, and right (right image).
Often, an improved detection method based on machine learning and a precompiled decision tree algorithm is used, allowing
better generalization for arc lengths smaller than 12 pixels.
www.augmentedrealitybook.org Computer Vision 44
Image: Gerhard Reitmayr
SIFT
0...2π 0...2π 0...2π 0...2π SIFT determines gradient
vectors for every pixel of the 8 × 8 image patch
A 2 × 2 descriptor array with 8-bin histograms relating cumulative gradient vector magnitude to gradient orientation is built.
www.augmentedrealitybook.org Computer Vision 45
46
Overview of SIFT Descriptor
• Keypoint localization
• Searching for minima and maxima in scale space (also provides scale estimation for scale invariance)
• Requires building pyramid of Difference of Gaussian
• Major performance bottleneck of SIFT
• Feature description
• Estimation of dominant 2D feature orientations
• Orientation histogram of 4x4 sub-regions (128 bins)
• Feature matching
• Feature database stored as k-d tree for sub-linear search time
www.augmentedrealitybook.org Computer Vision
47
Overall Rotation Check
• SIFT provides keypoint rotation for free
• All keypoints must have same relative rotation (90 degrees clockwise in this case)
• Look at histogram and keep
only peaks (majority by voting algorithm)
0 2π
www.augmentedrealitybook.org Computer Vision
48
Line Test
• Pick two correspondences Ú define line
• All other must be on the same side (red line indicates not on the same side)
www.augmentedrealitybook.org Computer Vision
Three-Point Pose (P3P)
q i
q j d ij
p j d i
p i
c
Φ i,j
d j
P3P computes the distance di from the camera center c to a 3D point qi
www.augmentedrealitybook.org Computer Vision 49
Tukey Estimator
www.augmentedrealitybook.org Computer Vision 50
51
Tracking by Detection
• This is what most
„trackers“ do…
• Targets are detected every frame
• Popular because detection and pose estimation
are solved simultaneously
www.augmentedrealitybook.org Computer Vision
Detection and Tracking
Detection Incremental
tracking Tracking target
detected
Tracking target lost
Tracking target not detected
Incremental tracking ok Start
+ Recognize target type + Detect target
+ Initialize camera pose
+ Fast
+ Robust to blur, lighting changes + Robust to tilt
Tracking and detection are complementary approaches.
After successful detection, the target is tracked incrementally.
If the target is lost, the detection is activated again
www.augmentedrealitybook.org Computer Vision 52
Motion Model
xt xt-1
Xt+1 t=(xt-xt-1)
t 3D
motion model
q search window pt-1 t=pt-pt-1
pt
2D motion model
search window image space
Active search in 2D Active search in 3D
www.augmentedrealitybook.org Computer Vision 53
54
Patch Tracking Idea
• Target Detection:
• find feature in reference image
• Target Tracking:
• Take previous pose and apply motion model
• Get estimate for what we are looking for
• Create affine warped patches of reference features
• Closely resemble how the feature should look in the camera image
• Project patches into camera image and use normalized cross correlation (NCC) to match
www.augmentedrealitybook.org Computer Vision
Patch Tracking
a patch taken from the template image
affinely warped using the estimated camera pose from a motion model
warped patch is compared to the current camera image
www.augmentedrealitybook.org Computer Vision 55
Hierarchical Feature Matching
only a small number of interest points used to obtain a first estimate of the camera pose
the full set of interest points is considered at full resolution, but with a much smaller search window
www.augmentedrealitybook.org Computer Vision 56
57
PatchTracker Workflow Analysis
• Affine warped patches allow very strong affine transformations (tilt close to 90°)
• NCC allows severe lighting changes
• 5 pixel search radius at half-res allows “wide” base-line
• 5 x 2 x 20Hz = 200 pixels/sec
• Tracking 100 features at full-res strongly reduces jitter
www.augmentedrealitybook.org Computer Vision
58
When Does It Break? (1/2)
www.augmentedrealitybook.org Computer Vision
59
When Does It Break? (2/2)
www.augmentedrealitybook.org Computer Vision
60
Orthogonal Strengths and Weaknesses
SIFT/Ferns PatchTracker
Recognize many targets
Detect target
Initialize tracking
Speed
Robust to blur
Robust to tilt
Robust to lighting changes
www.augmentedrealitybook.org Computer Vision
Windowed Bundle Adjustment
www.augmentedrealitybook.org Computer Vision 61
Windowed bundle adjustment limits the computational effort by only optimizing over neighboring camera poses
Visual Tracking Approaches
• Marker based tracking with artificial features
• Make a model before tracking
• Model based tracking with natural features
• Acquire a model before tracking
• Simultaneous localization and mapping (SLAM)
• Build a model while tracking it
• Example 1: Parallel Tracking and Mapping with Monocular Camera
• Example 2: KinectFusion
www.augmentedrealitybook.org Computer Vision 62
Parallel Tracking and Mapping
Tracking Mapping
New keyframes
Map updates
+ Estimate camera pose + For every frame
+ Extend map + Improve map + Slow updates rate
www.augmentedrealitybook.org Computer Vision 63
Parallel tracking and mapping uses two concurrent threads, one for tracking and one for mapping, which run at different speeds
Parallel Tracking and Mapping
Video stream
New frames
Map updates
Tracking Mapping
Tracked local pose
FAST SLOW
Simultaneous
localization and mapping (SLAM)
in small workspaces
Klein/Drummond, U. Cambridge
www.augmentedrealitybook.org Computer Vision 64
Keyframe SLAM
• Standard SLAM: Repeat until tracking is lost
• Extract features from live image (or track features in image)
• Match features to existing map
• Determine camera pose from matched 3D points
• Try to triangulate new features to get new 3D points
• Insert any new 3D points into map (or update existing map points)
• Keyframe SLAM
• Build map only from selected keyframes
• Split tracking and mapping into two threads
• Tracking at framerate, mapping at slower rate
www.augmentedrealitybook.org Computer Vision 65
Multi-Threaded SLAM
Tracking
• Estimate camera pose
• Must run at 30Hz
• As robust and accurate as possible
• Track/render loop with two tracking stages
• Coarse stage with ~50 big features
• Fine state with ~1000 random features
Mapping
• Get new keyframe
• Add new map points
• Optimize map
• Map Maintenance
www.augmentedrealitybook.org Computer Vision 66
Keyframes
• Keyframes are only added if
• Baseline to other keyframes large enough
• Tracking quality is good
• When a keyframe is added
• The mapping thread stops current maintenance work
• All points in the map are measured in the keyframe
• New map points are found and added to the map
www.augmentedrealitybook.org Computer Vision 67
New Map Points
• Find as many map points as possible
• Check all maximal FAST corners in the keyframe
• Check Shi-Tomasi score
• Check if already in map
• Epipolar search in a neighboring keyframe
• Triangulate matches and add to map
• Repeat in four image pyramid levels
www.augmentedrealitybook.org Computer Vision 68
Map Optimization and Maintenance
• Uses bundle adjustment
• Adjusts map point positions + keyframe poses
• Minimize reprojection error of all points in all keyframes (or use only last N keyframes)
• Improve map in idle time (camera not exploring)
• Try to measure new map features in old keyframes
www.augmentedrealitybook.org Computer Vision 69
Small Blurry Images
www.augmentedrealitybook.org Computer Vision 70
Computed by resampling 640x480 to 40x30 Blurred with a Gaussian kernel of size 5 pixels Use to re-detect pose after tracking failure
Tracking and Mapping on Mobiles
Panoramic SLAM
• Only rotation, user must stay in one place
• Works instantly
Full 6DOF SLAM
• User can move freely
• Needs baseline – walk several meters
Image: Christian Pirchheim
www.augmentedrealitybook.org Computer Vision 71
Hybrid SLAM
www.augmentedrealitybook.org Computer Vision 72
A SLAM system that can handle both general 6DOF motion and pure rotation has the advantage that the user is not constrained to a certain type of motion. It also presents the opportunity to recover 3D features (magenta, 洋紅色 ) from panoramic features (cyan, 青色 )
when additional views become available.
C: deep greenish-blue color Magenta: a reddish-purple color
Image: Christian Pirchheim
Hybrid SLAM Results
www.augmentedrealitybook.org Computer Vision 73
The combination of 6DOF and panoramic SLAM delivers much more robust tracking performance during arbitrary user motion. (top) Conventional 6DOF SLAM can track the pose for only 53% of the frames. (bottom) Combined SLAM can track the pose in 98% of the frames.
Image: Christian Pirchheim
Kinect Fusion Overview
• Simultaneous Localization And Mapping with depth sensor
• For 3D object scanning and model creation
• User moves depth sensor through scene
• Position and orientation are tracked by ICP
• Depth maps are combined to create a model
• Model is used for future tracking
Raw depth image Volumetric raycasting
TSDF Volume ICP Camera tracking
Volumetric integration
1
2 3
2 3
4
www.augmentedrealitybook.org Computer Vision 74
KinectFusion Output
• Tracking data
• Transformation matrix from the current to the first position + orientation
• Model
• Volumetric description of the scene
• Can be converted to a polygon mesh or rendered directly using raycasting
• Less noise and more details than the collected depth samples
• No holes
www.augmentedrealitybook.org Computer Vision 75
Kinect Fusion
•
KinectFusion: Real-Time Dense Surface Mapping• 2011 10th IEEE International Symposium on Mixed and Augmented Reality,
www.augmentedrealitybook.org Computer Vision 76
KinectFusion Tracking
• Convert input depth map to point cloud (color ignored)
• Compute normals for each point
• Find transformation modelcurrent depth map
• Interative Closest Points (ICP)
• ICP can only track small requires slow movements
• Outliers as by-product
• Observed points that do not fit model
• Can be used to detect moving objects
www.augmentedrealitybook.org Computer Vision 77
Tracking with Iterative Closest Points
Iterative Closest Point algorithm
1. Project current depth map using last known position 2. Find closest point pairs between depth map and model 3. If distances < threshold done
4. From list of correspondences, compute pose
• Solve linear equation system in a least squares sense
5. Use pose to reproject depth map, goto 2
www.augmentedrealitybook.org* Computer Vision 78
Truncated Signed Distance Function (TSDF)
-1.0 -1.0 -1.0 -1.0
-0.5 0.1 -1.0 -0.8
0.8 1.0 0.2 1.0
1.0 1.0 1.0 1.0 -1.0 -1.0
-1.0 -1.0
-1.0 -0.9 -1.0 -0.9
0.1 1.0 0.1 0.9
1.0 1.0 1.0 1.0 -1.0 -1.0
-1.0 -1.0
-1.0 -1.0 -1.0 -0.9
-0.8 0.3 -0.2 0.8
0.9 1.0 1.0 1.0 -1.0 -1.0
-1.0 -0.9
-1.0 -0.7 -0.1 0.1
-0.1 1.0 0.3 1.0
1.0 1.0 1.0 1.0
www.augmentedrealitybook.org Computer Vision 79
For every point in space, store the distance to the closest isosurface
Volumetric Integration
• Mapping step of SLAM
• Map is a voxel grid (3D array) of scalar values
• Current depth map is transformed using tracked pose
• Values close to the depth map are updated
• Effect of integrating based on TSDF
• Noise is smoothed by averaging
• Updating weight can be used to allow for changes in the scene
• Limited range of influence (number of updated voxels limited)
www.augmentedrealitybook.org Computer Vision 80
KinectFusion Rendering
• Usually done by raycasting: viewing rays are searched for 0
• Surface is the 0-level-set
• Can also be exported as a mesh
• Microsoft Kinect SDK applies Marching Cubes
• Texturing requires a set of color images + tranformations
• Texture coordinates can be found by projecting the vertices or voxels
www.augmentedrealitybook.org Computer Vision 81
Sparse Point Cloud
www.augmentedrealitybook.org Computer Vision 82
The main square of Graz, Austria, represented as a 3D point cloud computed using SFM from a large set of panoramic images.
Courtesy of Clemens Arth
View Cells
www.augmentedrealitybook.org Computer Vision 83
After reconstruction, the relevant parts of an urban area can be subdivided into cells. By preselecting cells based on a GPS measurement as a source, one can substantially prune the relevant portions of the reconstruction database.
Image: Clemens Arth
Prune Search Space with Sensor Priors
• GPS: only search near position prior
• Compass: only search in approximate heading
• Accelerometer/Gravity:
Only consider features with right orientation
GPS
Gravity
Compass
www.augmentedrealitybook.org Computer Vision 84
• 15% higher success rate of localization
• Much faster
Improvement with Sensor Priors
www.augmentedrealitybook.org Computer Vision 85
Prune Search Space with GIS
• OpenStreetMap now available everywhere!
• Offline:
• Align features with facades during
reconstruction
• Reconstructed model is less distorted
• Online:
• Compute visibility sets
• Visibility from facades (GPU)
• Prune database using visibility
• Detecting building outlines in image [in progress]
www.augmentedrealitybook.org Computer Vision 86
Sensor Priors
www.augmentedrealitybook.org Computer Vision 87
A magnetometer (compass) can be used as a source of prior information to narrow the search for point correspondences to those with a normal facing approximately toward the user.
Image: Clemens Arth
Gravity Aligned Features
www.augmentedrealitybook.org Computer Vision 88
Features with an orientation aligned to gravity rather than to a visual attribute such as the gradient can be matched more reliably.
Image: Clemens Arth
Visibility
www.augmentedrealitybook.org Computer Vision 89
The potentially visible set for the central square contains the street segments immediately connected to the square (blue arrows), but not the street segments after one or more turns (dashed red lines).
Parallel Tracking, Mapping and Localization
Video stream
New frames
Map updates
Tracking Mapping
Tracked global pose
Wide-area visual feature database
Global pose Matching New keyframes
CLIENT SERVER
FAST SLOW SLOWEST
www.augmentedrealitybook.org Computer Vision 90
Conventional SLAM (blue) performs tracking and mapping simultaneously on a mobile client device. By adding a localization server (green), a third concurrent activity is added: matching to a global database of visual features for wide-
area localization. Client and server operate independently, so the client can always run at the highest frame rate.
Tracking in Panoramas
www.augmentedrealitybook.org Computer Vision 91
Image: Clemens Arth
With panoramic SLAM, the user may perform only rotational motion, such as when exploring the immediate environment.
Courtesy of Daniel Wagner
Panorama Matching
www.augmentedrealitybook.org Computer Vision 92
The yellow lines show the feature matches obtained from a panoramic image. Note how certain direction, where facades are directly observed, perform very well, while directions facing down a street perform poorly. This illustrates why a wide field of view is needed for reliable outdoor localization.
Image: Clemens Arth
Outdoor Localization Result
www.augmentedrealitybook.org Computer Vision 93
Multiple images from a sequence tracked with 6DOF SLAM on a client, while a localization server provides the global pose used to overlay the building outlines with transparent yellow structures.
Image: Jonathan Ventura and Clemens Arth
Outdoor SLAM Result
www.augmentedrealitybook.org Computer Vision 94
This SLAM sequence starts with tracking a facade (overlaid in yellow), for which a global pose has been determined by a server. The images in the bottom row cannot continue tracking with information known to the server;
the poster in the foreground, which has been incorporated into the SLAM map, is used for tracking instead.
Image: Jonathan Ventura and Clemens Arth