# Chapter 4: Computer Vision

(1)

## Computer Vision

(2)

### fundamentals

Why homogeneous coordinate system?

2

(3)

3

(4)

4

(5)

5

(6)

6

z

X

Y

(7)

7

z y x

z y

x

z y

x

z y

x

(8)

8

x

y

z

x

y

z

z y

x

(9)

9

x

y

x

y

x

y

(10)

10

(11)

11

(12)

12

(13)

13

(14)

14

(15)

### Rotation representation in Quaternion

www.augmentedrealitybook.org Computer Vision 15

(16)

### Point P, after rotation, given by q in Quaternion, is P’

www.augmentedrealitybook.org Computer Vision 16

(17)

### transformation 1 (or some other initial rotation) and the other being the intended final rotation. This is more problematic with other representations of rotations.

www.augmentedrealitybook.org Computer Vision 17

(18)

### ,

www.augmentedrealitybook.org Computer Vision 18

(19)

### OpenCV 的第二個主要版本是 2009 年 10 月的 OpenCV 2.0

www.augmentedrealitybook.org Computer Vision 19

(20)

T

T

T

T

### in OpenCV, use the function solve(A, b, x, DECOMP_SVD);

www.augmentedrealitybook.org Computer Vision 20

solve(A, b, x, DECOMP_SVD);

(21)

### 5. Use final pose

www.augmentedrealitybook.org Computer Vision 21

1

3 2

5 4

Image: Daniel Wagner

(22)

### • Assume we known K (for now)

Optical axis

Center of projection

c

Image plane

π q

p

u v

z y

x

f c'

www.augmentedrealitybook.org Computer Vision 22

(23)

### (Pinhole Camera, eye at origin )

y x

z

z

P(x, y, z) P(x, y, z)

d d

yp xp

View along x axis View along y axis

Projection plane

Projection

plane

per

p p

p p

(24)

per p

p p

p p p

(25)

### • Cheaper threshold: compute locally and interpolate

www.augmentedrealitybook.org Computer Vision 25

(26)

1

2

3

1,m

2

3

4

### • Determine orientation from black corner at s

www.augmentedrealitybook.org Computer Vision i

i

26

a

p1 p1

m max

p2

p3

max

d1,m

p1 p2

p3 d2,3

p4

max max

(27)

z

x

y

C1

C2

### |t]

www.augmentedrealitybook.org Computer Vision 27

p q

π' π

v y x

u

(28)

### Homography Decomposition

h1

h2 h1,2

h2,1 RC3

RC1 RC2

To compute a pose from a homography, the rotation components of the

homography need to be ortho-normalized first.

See the next slide.

www.augmentedrealitybook.org Computer Vision 28

(29)

29

z

X

Y

(30)

### • Minimize using Gauss-Newton or Levenberg-Marquardt

www.augmentedrealitybook.org Computer Vision 30

(31)

### 5. Compute target pose (absolute orientation)

www.augmentedrealitybook.org Computer Vision 31

(32)

### Epipolar line (核線 ) (For example, Line e RXR )

www.augmentedrealitybook.org Computer Vision 32

(33)

### Cross product: in matrix representation

www.augmentedrealitybook.org Computer Vision 33

(34)

x

### • Therefore, a cross-product b =

www.augmentedrealitybook.org Computer Vision 34

(35)

epipolar line

### q

www.augmentedrealitybook.org Computer Vision 35

1

2

st

nd

2

2

2-T

x

1-1

1

P2

### = F

P1, F: fundamental Matrix, and F = K2-T • tx • R • K1-1

Fig. 4.3

(36)

### Multiple View Geometry: Essential Matrix and Fundamental Matrix

www.augmentedrealitybook.org Computer Vision 36

(37)

### Essential Matrix and Fundamental Matrix: II, where A1, A2 = K1, K2 and m1, m2 = P1, P2 in Fig. 4.3

www.augmentedrealitybook.org Computer Vision 37

(38)

### Epipolar geometry: Essential matrix

www.augmentedrealitybook.org Computer Vision 38

(39)

### Epipolar geometry: Essential matrix II

www.augmentedrealitybook.org Computer Vision 39

(40)

1

2

1

2

### • Solve using direct linear transform

www.augmentedrealitybook.org Computer Vision 40

c1

c2 p1

p2 q

d2 d1

(41)

### P cross-product M q = 0, solve for M.

www.augmentedrealitybook.org Computer Vision 41

(42)

i

i

### • Among good enough matches, select the best one

www.augmentedrealitybook.org Computer Vision 42

(43)

i

i

i

i

i

i

### • Define intermediate coordinate system

• Origin at q1

• X-axis xq aligned with q2-q1

• Y-axis yq orthogonal to xq, lies in plane (q1,q2,q3)

• Z-axis zq is cross-product of xq and yq

q

q

q

### • Do the same from the ri side and concatenate the rotations

www.augmentedrealitybook.org Computer Vision 43

(44)

### FAST

1 2 3 45 6 7 9 8

10 11 1312 14

15 16

FAST searches for a contiguous sequence of pixels on a circle, which are consistently lighter or darker than the center.

Early exit can be found by first testing the pixels at the top, bottom, left, and right (right image).

Often, an improved detection method based on machine learning and a precompiled decision tree algorithm is used, allowing

better generalization for arc lengths smaller than 12 pixels.

www.augmentedrealitybook.org Computer Vision 44

Image: Gerhard Reitmayr

(45)

### SIFT

0...2π 0...2π 0...2π 0...2π SIFT determines gradient

vectors for every pixel of the 8 × 8 image patch

A 2 × 2 descriptor array with 8-bin histograms relating cumulative gradient vector magnitude to gradient orientation is built.

www.augmentedrealitybook.org Computer Vision 45

(46)

46

### • Feature database stored as k-d tree for sub-linear search time

www.augmentedrealitybook.org Computer Vision

(47)

47

### only peaks (majority by voting algorithm)

0

www.augmentedrealitybook.org Computer Vision

(48)

48

### • All other must be on the same side (red line indicates not on the same side)

www.augmentedrealitybook.org Computer Vision

(49)

### d j

P3P computes the distance di from the camera center c to a 3D point qi

www.augmentedrealitybook.org Computer Vision 49

(50)

### Tukey Estimator

www.augmentedrealitybook.org Computer Vision 50

(51)

51

### are solved simultaneously

www.augmentedrealitybook.org Computer Vision

(52)

### Detection and Tracking

Detection Incremental

tracking Tracking target

detected

Tracking target lost

Tracking target not detected

Incremental tracking ok Start

+ Recognize target type + Detect target

+ Initialize camera pose

+ Fast

+ Robust to blur, lighting changes + Robust to tilt

Tracking and detection are complementary approaches.

After successful detection, the target is tracked incrementally.

If the target is lost, the detection is activated again

www.augmentedrealitybook.org Computer Vision 52

(53)

### Motion Model

xt xt-1

Xt+1 t=(xt-xt-1)

t 3D

motion model

q search window pt-1 t=pt-pt-1

pt

2D motion model

search window image space

Active search in 2D Active search in 3D

www.augmentedrealitybook.org Computer Vision 53

(54)

54

### • Project patches into camera image and use normalized cross correlation (NCC) to match

www.augmentedrealitybook.org Computer Vision

(55)

### Patch Tracking

a patch taken from the template image

affinely warped using the estimated camera pose from a motion model

warped patch is compared to the current camera image

www.augmentedrealitybook.org Computer Vision 55

(56)

### Hierarchical Feature Matching

only a small number of interest points used to obtain a first estimate of the camera pose

the full set of interest points is considered at full resolution, but with a much smaller search window

www.augmentedrealitybook.org Computer Vision 56

(57)

57

### • Tracking 100 features at full-res strongly reduces jitter

www.augmentedrealitybook.org Computer Vision

(58)

58

### When Does It Break? (1/2)

www.augmentedrealitybook.org Computer Vision

(59)

59

### When Does It Break? (2/2)

www.augmentedrealitybook.org Computer Vision

(60)

60

### Robust to lighting changes  

www.augmentedrealitybook.org Computer Vision

(61)

www.augmentedrealitybook.org Computer Vision 61

Windowed bundle adjustment limits the computational effort by only optimizing over neighboring camera poses

(62)

### • Example 2: KinectFusion

www.augmentedrealitybook.org Computer Vision 62

(63)

### Parallel Tracking and Mapping

Tracking Mapping

New keyframes

+ Estimate camera pose + For every frame

+ Extend map + Improve map + Slow updates rate

www.augmentedrealitybook.org Computer Vision 63

Parallel tracking and mapping uses two concurrent threads, one for tracking and one for mapping, which run at different speeds

(64)

### Parallel Tracking and Mapping

Video stream

New frames

Tracking Mapping

Tracked local pose

FAST SLOW

Simultaneous

localization and mapping (SLAM)

in small workspaces

Klein/Drummond, U. Cambridge

www.augmentedrealitybook.org Computer Vision 64

(65)

### • Tracking at framerate, mapping at slower rate

www.augmentedrealitybook.org Computer Vision 65

(66)

### • Map Maintenance

www.augmentedrealitybook.org Computer Vision 66

(67)

### • New map points are found and added to the map

www.augmentedrealitybook.org Computer Vision 67

(68)

### • Repeat in four image pyramid levels

www.augmentedrealitybook.org Computer Vision 68

(69)

### • Try to measure new map features in old keyframes

www.augmentedrealitybook.org Computer Vision 69

(70)

### Small Blurry Images

www.augmentedrealitybook.org Computer Vision 70

Computed by resampling 640x480 to 40x30 Blurred with a Gaussian kernel of size 5 pixels Use to re-detect pose after tracking failure

(71)

### • Needs baseline – walk several meters

Image: Christian Pirchheim

www.augmentedrealitybook.org Computer Vision 71

(72)

### Hybrid SLAM

www.augmentedrealitybook.org Computer Vision 72

A SLAM system that can handle both general 6DOF motion and pure rotation has the advantage that the user is not constrained to a certain type of motion. It also presents the opportunity to recover 3D features (magenta, 洋紅色 ) from panoramic features (cyan, 青色 )

C: deep greenish-blue color Magenta: a reddish-purple color

Image: Christian Pirchheim

(73)

### Hybrid SLAM Results

www.augmentedrealitybook.org Computer Vision 73

The combination of 6DOF and panoramic SLAM delivers much more robust tracking performance during arbitrary user motion. (top) Conventional 6DOF SLAM can track the pose for only 53% of the frames. (bottom) Combined SLAM can track the pose in 98% of the frames.

Image: Christian Pirchheim

(74)

### • Model is used for future tracking

Raw depth image Volumetric raycasting

TSDF Volume ICP Camera tracking

Volumetric integration

1

2 3

2 3

4

www.augmentedrealitybook.org Computer Vision 74

(75)

### • No holes

www.augmentedrealitybook.org Computer Vision 75

(76)

### •

KinectFusion: Real-Time Dense Surface Mapping

### • 2011 10th IEEE International Symposium on Mixed and Augmented Reality,

www.augmentedrealitybook.org Computer Vision 76

(77)

### • Can be used to detect moving objects

www.augmentedrealitybook.org Computer Vision 77

(78)

### 5. Use pose to reproject depth map, goto 2

www.augmentedrealitybook.org* Computer Vision 78

(79)

### Truncated Signed Distance Function (TSDF)

-1.0 -1.0 -1.0 -1.0

-0.5 0.1 -1.0 -0.8

0.8 1.0 0.2 1.0

1.0 1.0 1.0 1.0 -1.0 -1.0

-1.0 -1.0

-1.0 -0.9 -1.0 -0.9

0.1 1.0 0.1 0.9

1.0 1.0 1.0 1.0 -1.0 -1.0

-1.0 -1.0

-1.0 -1.0 -1.0 -0.9

-0.8 0.3 -0.2 0.8

0.9 1.0 1.0 1.0 -1.0 -1.0

-1.0 -0.9

-1.0 -0.7 -0.1 0.1

-0.1 1.0 0.3 1.0

1.0 1.0 1.0 1.0

www.augmentedrealitybook.org Computer Vision 79

For every point in space, store the distance to the closest isosurface

(80)

### • Limited range of influence (number of updated voxels limited)

www.augmentedrealitybook.org Computer Vision 80

(81)

### • Texture coordinates can be found by projecting the vertices or voxels

www.augmentedrealitybook.org Computer Vision 81

(82)

### Sparse Point Cloud

www.augmentedrealitybook.org Computer Vision 82

The main square of Graz, Austria, represented as a 3D point cloud computed using SFM from a large set of panoramic images.

Courtesy of Clemens Arth

(83)

### View Cells

www.augmentedrealitybook.org Computer Vision 83

After reconstruction, the relevant parts of an urban area can be subdivided into cells. By preselecting cells based on a GPS measurement as a source, one can substantially prune the relevant portions of the reconstruction database.

Image: Clemens Arth

(84)

### Only consider features with right orientation

GPS

Gravity

Compass

www.augmentedrealitybook.org Computer Vision 84

(85)

### Improvement with Sensor Priors

www.augmentedrealitybook.org Computer Vision 85

(86)

### • Detecting building outlines in image [in progress]

www.augmentedrealitybook.org Computer Vision 86

(87)

### Sensor Priors

www.augmentedrealitybook.org Computer Vision 87

A magnetometer (compass) can be used as a source of prior information to narrow the search for point correspondences to those with a normal facing approximately toward the user.

Image: Clemens Arth

(88)

### Gravity Aligned Features

www.augmentedrealitybook.org Computer Vision 88

Features with an orientation aligned to gravity rather than to a visual attribute such as the gradient can be matched more reliably.

Image: Clemens Arth

(89)

### Visibility

www.augmentedrealitybook.org Computer Vision 89

The potentially visible set for the central square contains the street segments immediately connected to the square (blue arrows), but not the street segments after one or more turns (dashed red lines).

(90)

### Parallel Tracking, Mapping and Localization

Video stream

New frames

Tracking Mapping

Tracked global pose

Wide-area visual feature database

Global pose Matching New keyframes

CLIENT SERVER

FAST SLOW SLOWEST

www.augmentedrealitybook.org Computer Vision 90

Conventional SLAM (blue) performs tracking and mapping simultaneously on a mobile client device. By adding a localization server (green), a third concurrent activity is added: matching to a global database of visual features for wide-

area localization. Client and server operate independently, so the client can always run at the highest frame rate.

(91)

### Tracking in Panoramas

www.augmentedrealitybook.org Computer Vision 91

Image: Clemens Arth

With panoramic SLAM, the user may perform only rotational motion, such as when exploring the immediate environment.

Courtesy of Daniel Wagner

(92)

### Panorama Matching

www.augmentedrealitybook.org Computer Vision 92

The yellow lines show the feature matches obtained from a panoramic image. Note how certain direction, where facades are directly observed, perform very well, while directions facing down a street perform poorly. This illustrates why a wide field of view is needed for reliable outdoor localization.

Image: Clemens Arth

(93)

### Outdoor Localization Result

www.augmentedrealitybook.org Computer Vision 93

Multiple images from a sequence tracked with 6DOF SLAM on a client, while a localization server provides the global pose used to overlay the building outlines with transparent yellow structures.

Image: Jonathan Ventura and Clemens Arth

(94)

### Outdoor SLAM Result

www.augmentedrealitybook.org Computer Vision 94

This SLAM sequence starts with tracking a facade (overlaid in yellow), for which a global pose has been determined by a server. The images in the bottom row cannot continue tracking with information known to the server;

the poster in the foreground, which has been incorporated into the SLAM map, is used for tracking instead.

Image: Jonathan Ventura and Clemens Arth

Updating...

## References

Related subjects :