Chapter 4: Computer Vision

(1)

Chapter 4:

Computer Vision

Augmented Reality – Principles and Practice

http://www.augmentedrealitybook.org

(2)

Geometrical Transformation: ^3D

fundamentals

Why homogeneous coordinate system?

Translation

• P'=T + P ... (I) Scaling

• P'=S x P ... (II) Rotation

• P'=R x P ...(III)

T : translation matrix S : scaling matrix

R : rotation matrix

(II) & (III) are multiplications

2

(3)

3D Rotation & Translation

3

(4)

Rotation axis: X, Y, Z

4

(5)

Geometrical Transformation

• To be able to compose transformations, each point (x,y,z) is

represented by (x,y,z,w) where w is usually 1. So, translation becomes

5

 





 





 





 







 





 





1 1

0 0

0 1 0

0 0 1

0 0 0

1 1 '

' '

z y x dz

dy dx z

y x

dx x

x '  

(6)

3D rotation

6

 





 



 



1 0

0 0

0 1

0 0

cos sin

0 0

sin cos

)

( T T

T T

T R

_z

 





 





 

1 0

0 0

0 cos

sin 0

0 sin

cos 0

0 0

0 1

)

( T T

T T T

R

_X

 





 





 

1 0

0 0

0 cos

0 sin

0 0

1 0

0 sin

0 cos

)

( T T

T T

T

R

_Y

(7)

3D Translation & Scaling

• Translation

7

• Scaling

 





 





 





 







1 1

0 0

0 1 0

0 0 1

0 0 0

1 )

, ,

( z

y x

d d d d

d d

T

z y x

z y

x

 





 





 





 







1 1

0 0

0 0 0

0 )

, ,

( z

y x

S S

S

z y

x

z y

x

(8)

8

Scaling

S = S(s

_x

, s

_y

, s

_z

) =

x’=s

_x

x y’=s

_y

y z’=s

_z

z p’=Sp

Expand or contract along each axis (fixed point of origin)

Angel and Shreiner: Interactive Computer Graphics 7E © Addison-Wesley 2015

 





 





1 0

0 0

0 0 0

0

z y

x

s s

s

(9)

9

Reflection

corresponds to negative scale factors

original s

_x

= -1 s

_y

= 1

s

_x

= -1 s

_y

= -1 s

_x

= 1 s

_y

= -1

(10)

More on Geometrical Transformations

• Affine transformation

• preserving parallelism of lines, but not lengths and angles.

• Rotation, translation, scaling and shear transformations are affine.

• Shear ( 剪力 )

10

 







 





 



 







 







 







 







1 1

1 0

0 0 1

y ay x

y

x

a

(11)

3D shear

11

 





 







1 0

0 0

0 1

0 0

0 1

0 0 0

1 )

,

( ^y

x

y x

xy

SH SH SH

SH

(12)

12

Shear

• Helpful to add one more basic transformation

• Equivalent to pulling faces in opposite directions

(13)

Rotation by shearing, a 2D example

13

(14)

Transformation Matrix

Any number of rotating, scaling, and transition can be multiplied together!

Post-multiplication vs. pre-multiplication

14

A T

z y

z x y x

A [ 1 ]

1 

 





 





(15)

Rotation representation in Quaternion

www.augmentedrealitybook.org Computer Vision 15

(16)

Point P, after rotation, given by q in Quaternion, is P’

(17)

Advantages of quaternions

• The representation of a rotation as a quaternion (4 numbers) is more compact than the representation as an orthogonal matrix (9 numbers). Furthermore, for a given axis and angle, one can easily construct the corresponding quaternion, and conversely, for a given quaternion one can easily read off the axis and the angle.

Both of these are much harder with matrices or Euler angles.

• In video games and other applications, one is often interested in “smooth

rotations”, meaning that the scene should slowly rotate and not in a single step.

This can be accomplished by choosing a curve such as the spherical linear interpolation in the quaternions, with one endpoint being the identity

transformation 1 (or some other initial rotation) and the other being the intended final rotation. This is more problematic with other representations of rotations.

(18)

To solve the problems efficiently:

use OpenCV

• OpenCV 的全稱是 Open Source Computer Vision Library ，是一個跨平台的電腦視覺庫。 OpenCV 是由英特爾公司發起並參與開發，

以 BSD 授權條款授權發行，可以在商業和研究領域中免費使

用。 OpenCV 可用於開發即時的圖像處理、電腦視覺以及圖型識別程式。該程式庫也可以使用英特爾公司的 IPP 進行加速處理。

• OpenCV 可用於解決如下領域的問題：

•

擴增實境, 臉部辨識

,

手勢辨識

,

人機互動, 動作辨識, 運動跟蹤

,

物體辨識

,

圖像分割

,

機器人

(19)

OpenCV

OpenCV 用 C++ 語言編寫，它的主要介面也是 C++ 語言，但是依然

保留了大量的 C 語言介面。該庫也有大量的 Python, Java and

MATLAB/OCTAVE ( 版本 2.5) 的介面。這些語言的 API 介面函式可以透過線上文件取得。現在也提供對於 C#, Ch,[5] Ruby 的支援。

OpenCV 的第二個主要版本是 2009 年 10 月的 OpenCV 2.0

(20)

A typical example in OpenCV:

Optimization

• What is best practice to solve least square problem Ax = b?

where A is a matrix and b, x is a vector

It often happens that Ax = b has no solution. The usual reason is: too many equations. The matrix has more rows than columns.

To repeat: We cannot always get the error e = b - Ax down to zero. When e is zero, x is an exact solution to Ax = b. When the length of e is as small as possible, x’

is a least squares solution. Our goal in this section is to compute x’ and use it. These are real problems and they need an answer.

If Ax is linear, A

^T

Ax = A

^T

b, Therefore, x = A

^T

b/(A

^T

A), if Ax is non-linear, use Iterative method

in OpenCV, use the function solve(A, b, x, DECOMP_SVD);

solve(A, b, x, DECOMP_SVD);

(21)

First Application: Marker Tracking

1. Capturing image with known camera 2. Search for quadrilaterals

3. Pose estimation from homography 4. Pose refinement

Minimize nonlinear projection error

5. Use final pose

1

3 2

5 4

Image: Daniel Wagner

(22)

Pinhole Camera

• Project 3D point q to 2D point p

• Center of projection c

• Image plane ∏

• Principal point c’

• Focal length f

• p = M q = K [R|t] q

• 5DOF for K, 3DOF for R, 3DOF for t

• Assume we known K (for now)

Optical axis

Center of projection

c

Image plane

π q

p

u v

z y

x

f c'

(23)

Perspective Projection

(Pinhole Camera, eye at origin )

y x

z

P(x, y, z) P(x, y, z)

d d

y_p x_p

View along x axis View along y axis

Projection plane

Projection

plane

   





 







0 /

1 0

0 0 1

0 0

1 0

0 0

0 1

; / /

;

d M

d z y y

d z x x

z y d

y z x d

x

per

p p

(24)

Perspective Division

However W  1, so we must divide by W to return from homogeneous coordinates

 







 









 





 







 





 











 





 







 





 





d z z y x

z y x

d P

M W

Z Y X

z y x

per p

p p

1 0

/ 1 0

0 0 1

0 0

1 0

0 0

0 1

1   



 



 

 

 



  , , 1

, / 1 /

, ,

, 1

, ,

, d

d z

y d

z x W

Z W

Y W

z X y

x

_p _p _p

(25)

p ₄ p ₃

p ₂ p ₁

m s ₄ s ₃

s ₁ s ₂

Marker Detection

• Grayscale image

• Adaptive image threshold with

gradient of logarithm of intensities

• Cheaper threshold: compute locally and interpolate

(26)

Quad Finding

• Find edges (black pixel after white) on every n-th line

• Follow edge in 4-connected neighborhood

• Until loop closed or hitting border

• Start at a and walk contour, search p

₁

at maximum distance

• Compute centroid m

• Find corners p

₂

, p

₃

on either side of d

_1,m

=(p

₂

,p

₃

)

• Find farthest point p

₄

• Determine orientation from black corner at s

www.augmentedrealitybook.org Computer Vision _i

=(p

_i

+m)/2

26

a

p₁ p₁

m max

p₂

p₃

max

d_1,m

p₁ p₂

p₃ d_2,3

p₄

max max

(27)

Pose Estimation from Homography ⁽ ^單

應性 , 射影變換 )

• Marker corners lie in plane

∏’: q

_z

=0

• Express 3D point qϵ∏’ as

homogeneous point q’=[q

_x

,q

_y

,1]

• Mapping = homography p = H q’

• Estimate H using direct linear transformation

• Recover pose R,t from H=K[R

_C1

|R

_C2

|t]

p q

π' π

v y x

u

(28)

Homography Decomposition

h₁

h₂ h_1,2

h_2,1 R_C3

R_C1 R_C2

To compute a pose from a homography, the rotation components of the

homography need to be ortho-normalized first.

See the next slide.

(29)

3D rotation (Note: the rotation components of the matrix need to be ortho-normal)

29

 





 



 



1 0

0 0

0 1

0 0

cos sin

0 0

sin cos

)

( T T

T T

T R

_z

 





 





 

1 0

0 0

0 cos

sin 0

0 sin

cos 0

0 0

0 1

)

( T T

T T T

R

_X

 





 





 

1 0

0 0

0 cos

0 sin

0 0

1 0

0 sin

0 cos

)

( T T

T T

T

R

_Y

(30)

Pose refinement

• Iteratively minimize reprojection error

• Displacement of known 3D points qi projected by [R|t] from known image locations pi

• Minimize using Gauss-Newton or Levenberg-Marquardt

(31)

Multiple-Camera Infrared Tracking

1. Blob detection in all images (centroid of connected regions) 2. Establish point correspondences between blobs

using epipolar geometry

3. Triangulation of 3D points from multiple 2D points ⁽ ^{三角定位 )} 4. Matching of 3D candidate points to target points

5. Compute target pose (absolute orientation)

(32)

Epipolar line ⁽ ^{核線 )} (For example, Line e _R ^X _R )

(33)

Cross product: in matrix representation

(34)

Matrix representation of a

_x

• Therefore, a cross-product b =

(35)

epipolar line

Epipolar Geometry

e

l

c ₁ c ₂

p ₁ p ₂

e ₁ e ₂

π ₁ π ₂

q

Known:

K

₁

, K

₂

, [R|t] (1

^st

2

^nd

camera) Search for p

₂

on epipolar line l

p

₂

= K

₂^-T

• t

_x

• R • K

₁^-1

• p

₁

P2

= F

P1, F: fundamental Matrix, and F = K₂^-T• t_x• R • K₁^-1

相機 C1, 經過旋轉 , 平移 , 會到相機 C2, 求旋轉 , 平移的量值

Fig. 4.3

(36)

Multiple View Geometry: Essential Matrix and Fundamental Matrix

(37)

Essential Matrix and Fundamental Matrix: II, where A1, A2 = K1, K2 and m1, m2 = P1, P2 in Fig. 4.3

(38)

Epipolar geometry: Essential matrix

(39)

Epipolar geometry: Essential matrix II

(40)

Triangulation from Two Cameras

• Just two cameras

• Two rays starting at c

₁

, c

₂

in directions of points p

₁

, p

₂

• Find point q closest to the two rays

• More than two cameras

• Build equation system from relationships p=M q

• Each camera yields two equations

• Solve using direct linear transform

c₁

c₂ p₁

p₂ q

d₂ d₁

(41)

Solution

If P = M q

P cross-product M q = 0, solve for M.

(42)

Target Matching

• Candidate points q

_i

, target points r

_i

, i ≤ j

• Geometric signature of target

• Distances and angles between pairs of points

• Compare to candidate’s signature

• Among good enough matches, select the best one

(43)

Absolute Orientation

• Which R, t aligns the candidate points with the target points?

• Determine t from q

i

to r

_i

• Difference of centroids of q

i

and r

i

• Determine R from q

_i

to r

_i

• Define intermediate coordinate system

• Origin at q1

• X-axis xq aligned with q2-q1

• Y-axis yq orthogonal to x_q, lies in plane (q₁,q₂,q₃)

• Z-axis zq is cross-product of xq and yq

• [x

q

|y

_q

|z

_q

] is a rotation to the intermediate coordinate system

• Do the same from the ri side and concatenate the rotations

(44)

FAST

1 2 3 45 6 7 9 8

10 11 1312 14

15 16

FAST searches for a contiguous sequence of pixels on a circle, which are consistently lighter or darker than the center.

Early exit can be found by first testing the pixels at the top, bottom, left, and right (right image).

Often, an improved detection method based on machine learning and a precompiled decision tree algorithm is used, allowing

better generalization for arc lengths smaller than 12 pixels.

Image: Gerhard Reitmayr

(45)

SIFT

0...2π 0...2π 0...2π 0...2π SIFT determines gradient

vectors for every pixel of the 8 × 8 image patch

A 2 × 2 descriptor array with 8-bin histograms relating cumulative gradient vector magnitude to gradient orientation is built.

(46)

46

Overview of SIFT Descriptor

• Keypoint localization

• Searching for minima and maxima in scale space (also provides scale estimation for scale invariance)

• Requires building pyramid of Difference of Gaussian

• Major performance bottleneck of SIFT

• Feature description

• Estimation of dominant 2D feature orientations

• Orientation histogram of 4x4 sub-regions (128 bins)

• Feature matching

• Feature database stored as k-d tree for sub-linear search time

www.augmentedrealitybook.org Computer Vision

(47)

47

Overall Rotation Check

• SIFT provides keypoint rotation for free

• All keypoints must have same relative rotation (90 degrees clockwise in this case)

• Look at histogram and keep

only peaks (majority by voting algorithm)

0 2π

(48)

48

Line Test

• Pick two correspondences Ú define line

• All other must be on the same side (red line indicates not on the same side)

(49)

Three-Point Pose (P3P)

q _i

q _j d _ij

p _j d _i

p _i

c

Φ _i,j

d _j

P3P computes the distance d_i from the camera center c to a 3D point q_i

(50)

Tukey Estimator

(51)

51

Tracking by Detection

• This is what most

„trackers“ do…

• Targets are detected every frame

• Popular because detection and pose estimation

are solved simultaneously

(52)

Detection and Tracking

Detection Incremental

tracking Tracking target

detected

Tracking target lost

Tracking target not detected

Incremental tracking ok Start

+ Recognize target type + Detect target

+ Initialize camera pose

+ Fast

+ Robust to blur, lighting changes + Robust to tilt

Tracking and detection are complementary approaches.

After successful detection, the target is tracked incrementally.

If the target is lost, the detection is activated again

(53)

Motion Model

x_t x_t-1

X_t+1 t=(x_t-x_t-1)

t 3D

motion model

q search window p_t-1 t=p_t-p_t-1

p_t

2D motion model

search window image space

Active search in 2D Active search in 3D

(54)

54

Patch Tracking Idea

• Target Detection:

• find feature in reference image

• Target Tracking:

• Take previous pose and apply motion model

• Get estimate for what we are looking for

• Create affine warped patches of reference features

• Closely resemble how the feature should look in the camera image

• Project patches into camera image and use normalized cross correlation (NCC) to match

(55)

Patch Tracking

a patch taken from the template image

affinely warped using the estimated camera pose from a motion model

warped patch is compared to the current camera image

(56)

Hierarchical Feature Matching

only a small number of interest points used to obtain a first estimate of the camera pose

the full set of interest points is considered at full resolution, but with a much smaller search window

(57)

57

PatchTracker Workflow Analysis

• Affine warped patches allow very strong affine transformations (tilt close to 90°)

• NCC allows severe lighting changes

• 5 pixel search radius at half-res allows “wide” base-line

• 5 x 2 x 20Hz = 200 pixels/sec

• Tracking 100 features at full-res strongly reduces jitter

(58)

58

When Does It Break? (1/2)

(59)

59

When Does It Break? (2/2)

(60)

60

Orthogonal Strengths and Weaknesses

SIFT/Ferns PatchTracker

Recognize many targets  

Detect target  

Initialize tracking  

Speed  

Robust to blur  

Robust to tilt  

Robust to lighting changes  

(61)

Windowed Bundle Adjustment

Windowed bundle adjustment limits the computational effort by only optimizing over neighboring camera poses

(62)

Visual Tracking Approaches

• Marker based tracking with artificial features

• Make a model before tracking

• Model based tracking with natural features

• Acquire a model before tracking

• Simultaneous localization and mapping (SLAM)

• Build a model while tracking it

• Example 1: Parallel Tracking and Mapping with Monocular Camera

• Example 2: KinectFusion

(63)

Parallel Tracking and Mapping

Tracking Mapping

New keyframes

Map updates

+ Estimate camera pose + For every frame

+ Extend map + Improve map + Slow updates rate

Parallel tracking and mapping uses two concurrent threads, one for tracking and one for mapping, which run at different speeds

(64)

Parallel Tracking and Mapping

Video stream

New frames

Map updates

Tracking Mapping

Tracked local pose

FAST SLOW

Simultaneous

localization and mapping (SLAM)

in small workspaces

Klein/Drummond, U. Cambridge

(65)

Keyframe SLAM

• Standard SLAM: Repeat until tracking is lost

• Extract features from live image (or track features in image)

• Match features to existing map

• Determine camera pose from matched 3D points

• Try to triangulate new features to get new 3D points

• Insert any new 3D points into map (or update existing map points)

• Keyframe SLAM

• Build map only from selected keyframes

• Split tracking and mapping into two threads

• Tracking at framerate, mapping at slower rate

(66)

Multi-Threaded SLAM

Tracking

• Estimate camera pose

• Must run at 30Hz

• As robust and accurate as possible

• Track/render loop with two tracking stages

• Coarse stage with ~50 big features

• Fine state with ~1000 random features

Mapping

• Get new keyframe

• Add new map points

• Optimize map

• Map Maintenance

(67)

Keyframes

• Keyframes are only added if

• Baseline to other keyframes large enough

• Tracking quality is good

• When a keyframe is added

• The mapping thread stops current maintenance work

• All points in the map are measured in the keyframe

• New map points are found and added to the map

(68)

New Map Points

• Find as many map points as possible

• Check all maximal FAST corners in the keyframe

• Check Shi-Tomasi score

• Check if already in map

• Epipolar search in a neighboring keyframe

• Triangulate matches and add to map

• Repeat in four image pyramid levels

(69)

Map Optimization and Maintenance

• Uses bundle adjustment

• Adjusts map point positions + keyframe poses

• Minimize reprojection error of all points in all keyframes (or use only last N keyframes)

• Improve map in idle time (camera not exploring)

• Try to measure new map features in old keyframes

(70)

Small Blurry Images

Computed by resampling 640x480 to 40x30 Blurred with a Gaussian kernel of size 5 pixels Use to re-detect pose after tracking failure

(71)

Tracking and Mapping on Mobiles

Panoramic SLAM

• Only rotation, user must stay in one place

• Works instantly

Full 6DOF SLAM

• User can move freely

• Needs baseline – walk several meters

Image: Christian Pirchheim

(72)

Hybrid SLAM

A SLAM system that can handle both general 6DOF motion and pure rotation has the advantage that the user is not constrained to a certain type of motion. It also presents the opportunity to recover 3D features (magenta, 洋紅色 ) from panoramic features (cyan, 青色 )

when additional views become available.

C: deep greenish-blue color Magenta: a reddish-purple color

(73)

Hybrid SLAM Results

The combination of 6DOF and panoramic SLAM delivers much more robust tracking performance during arbitrary user motion. (top) Conventional 6DOF SLAM can track the pose for only 53% of the frames. (bottom) Combined SLAM can track the pose in 98% of the frames.

(74)

Kinect Fusion Overview

• Simultaneous Localization And Mapping with depth sensor

• For 3D object scanning and model creation

• User moves depth sensor through scene

• Position and orientation are tracked by ICP

• Depth maps are combined to create a model

• Model is used for future tracking

Raw depth image Volumetric raycasting

TSDF Volume ICP Camera tracking

Volumetric integration

1

2 3

4

(75)

KinectFusion Output

• Tracking data

• Transformation matrix from the current to the first position + orientation

• Model

• Volumetric description of the scene

• Can be converted to a polygon mesh or rendered directly using raycasting

• Less noise and more details than the collected depth samples

• No holes

(76)

Kinect Fusion

•

KinectFusion: Real-Time Dense Surface Mapping

• 2011 10th IEEE International Symposium on Mixed and Augmented Reality,

(77)

KinectFusion Tracking

• Convert input depth map to point cloud (color ignored)

• Compute normals for each point

• Find transformation modelcurrent depth map

• Interative Closest Points (ICP)

• ICP can only track small  requires slow movements

• Outliers as by-product

• Observed points that do not fit model

• Can be used to detect moving objects

(78)

Tracking with Iterative Closest Points

Iterative Closest Point algorithm

1. Project current depth map using last known position 2. Find closest point pairs between depth map and model 3. If distances < threshold  done

4. From list of correspondences, compute pose

• Solve linear equation system in a least squares sense

5. Use pose to reproject depth map, goto 2

www.augmentedrealitybook.org* Computer Vision 78

(79)

Truncated Signed Distance Function (TSDF)

-1.0 -1.0 -1.0 -1.0

-0.5 0.1 -1.0 -0.8

0.8 1.0 0.2 1.0

1.0 1.0 1.0 1.0 -1.0 -1.0

-1.0 -1.0

-1.0 -0.9 -1.0 -0.9

0.1 1.0 0.1 0.9

1.0 1.0 1.0 1.0 -1.0 -1.0

-1.0 -1.0

-1.0 -1.0 -1.0 -0.9

-0.8 0.3 -0.2 0.8

0.9 1.0 1.0 1.0 -1.0 -1.0

-1.0 -0.9

-1.0 -0.7 -0.1 0.1

-0.1 1.0 0.3 1.0

1.0 1.0 1.0 1.0

For every point in space, store the distance to the closest isosurface

(80)

Volumetric Integration

• Mapping step of SLAM

• Map is a voxel grid (3D array) of scalar values

• Current depth map is transformed using tracked pose

• Values close to the depth map are updated

• Effect of integrating based on TSDF

• Noise is smoothed by averaging

• Updating weight can be used to allow for changes in the scene

• Limited range of influence (number of updated voxels limited)

(81)

KinectFusion Rendering

• Usually done by raycasting: viewing rays are searched for 0

• Surface is the 0-level-set

• Can also be exported as a mesh

• Microsoft Kinect SDK applies Marching Cubes

• Texturing requires a set of color images + tranformations

• Texture coordinates can be found by projecting the vertices or voxels

(82)

Sparse Point Cloud

The main square of Graz, Austria, represented as a 3D point cloud computed using SFM from a large set of panoramic images.

Courtesy of Clemens Arth

(83)

View Cells

After reconstruction, the relevant parts of an urban area can be subdivided into cells. By preselecting cells based on a GPS measurement as a source, one can substantially prune the relevant portions of the reconstruction database.

Image: Clemens Arth

(84)

Prune Search Space with Sensor Priors

• GPS: only search near position prior

• Compass: only search in approximate heading

• Accelerometer/Gravity:

Only consider features with right orientation

GPS

Gravity

Compass

(85)

• 15% higher success rate of localization

• Much faster

Improvement with Sensor Priors

(86)

Prune Search Space with GIS

• OpenStreetMap now available everywhere!

• Offline:

• Align features with facades during

reconstruction

• Reconstructed model is less distorted

• Online:

• Compute visibility sets

• Visibility from facades (GPU)

• Prune database using visibility

• Detecting building outlines in image [in progress]

(87)

Sensor Priors

A magnetometer (compass) can be used as a source of prior information to narrow the search for point correspondences to those with a normal facing approximately toward the user.

Image: Clemens Arth

(88)

Gravity Aligned Features

Features with an orientation aligned to gravity rather than to a visual attribute such as the gradient can be matched more reliably.

Image: Clemens Arth

(89)

Visibility

The potentially visible set for the central square contains the street segments immediately connected to the square (blue arrows), but not the street segments after one or more turns (dashed red lines).

(90)

Parallel Tracking, Mapping and Localization

Video stream

New frames

Map updates

Tracking Mapping

Tracked global pose

Wide-area visual feature database

Global pose Matching New keyframes

CLIENT SERVER

FAST SLOW SLOWEST

Conventional SLAM (blue) performs tracking and mapping simultaneously on a mobile client device. By adding a localization server (green), a third concurrent activity is added: matching to a global database of visual features for wide-

area localization. Client and server operate independently, so the client can always run at the highest frame rate.

(91)

Tracking in Panoramas

Image: Clemens Arth

With panoramic SLAM, the user may perform only rotational motion, such as when exploring the immediate environment.

Courtesy of Daniel Wagner

(92)

Panorama Matching

The yellow lines show the feature matches obtained from a panoramic image. Note how certain direction, where facades are directly observed, perform very well, while directions facing down a street perform poorly. This illustrates why a wide field of view is needed for reliable outdoor localization.

Image: Clemens Arth

(93)

Outdoor Localization Result

Multiple images from a sequence tracked with 6DOF SLAM on a client, while a localization server provides the global pose used to overlay the building outlines with transparent yellow structures.

Image: Jonathan Ventura and Clemens Arth

(94)

Outdoor SLAM Result

This SLAM sequence starts with tracking a facade (overlaid in yellow), for which a global pose has been determined by a server. The images in the bottom row cannot continue tracking with information known to the server;

the poster in the foreground, which has been incorporated into the SLAM map, is used for tracking instead.

Image: Jonathan Ventura and Clemens Arth

Chapter 4: Computer Vision

Chapter 4:

Computer Vision

Augmented Reality – Principles and Practice

http://www.augmentedrealitybook.org

Geometrical Transformation: 3D

fundamentals

Translation

• P'=T + P ... (I) Scaling

• P'=S x P ... (II) Rotation

• P'=R x P ...(III)

T : translation matrix S : scaling matrix

R : rotation matrix

(II) & (III) are multiplications

3D Rotation & Translation

Rotation axis: X, Y, Z

Geometrical Transformation

• To be able to compose transformations, each point (x,y,z) is

represented by (x,y,z,w) where w is usually 1. So, translation becomes

 

 





 

 





 

 





 

 







 

 





 

 





1 1

0 0

0

1 0

0

0 1

0

0 0

1

1 '

' '

z y x dz

dy dx z

y x

dx x

x '  

3D rotation

 

 





 

 



 



1 0

0 0

0 1

0 0

0 0

cos sin

0 0

sin cos

)

( T T

Geometrical Transformation: ^3D