More on Features
Digital Visual Effects, Spring 2007 Yung-Yu Chuang
2007/3/27
with slides by Trevor Darrell Cordelia Schmid, David Lowe, Darya Frolova, Denis Simakov, Robert Collins, Brad Osgood, W W L Chen, and Jiwon Kim
Announcements
• Project #1 was due at noon today. You have a total of 10 delay days without penalty, but you are advised to use them wisely.
• We reserve the rights for not including late homework for artifact voting.
• Project #2 handout will be available on the web today.
• We may not have class next week. I will send out mails if the class is canceled.
Outline
• Harris corner detector
• SIFT
• SIFT extensions
• MSOP
Three components for features
• Feature detection
• Feature description
• Feature matching
Harris corner detector
Harris corner detector
¾Consider all small shifts by Taylor’s expansion
∑
∑
∑
=
=
=
+ +
=
y x
y x
y x
y y
x
x
y x I y x I y x w C
y x I y x w B
y x I y x w A
Bv Cuv Au
v u E
, ,
2 ,
2
2 2
) , ( ) , ( ) , (
) , ( ) , (
) , ( ) , (
2 )
, (
Harris corner detector
[ ]
( , ) , u
E u v u v M v
≅ ⎡ ⎤⎢ ⎥
⎣ ⎦
Equivalently, for small shifts [u,v] we have a bilinear approximation:
2
2 ,
( , ) x x y
x y x y y
I I I
M w x y
I I I
⎡ ⎤
= ⎢ ⎥
⎢ ⎥
⎣ ⎦
∑
, where M is a 2×2 matrix computed from image derivatives:
Harris corner detector (matrix form)
Hu u
u u u u
u u
u u
x u
x
u
0 0T
T T
T
T
I I I
I I I
I I
E
=
∂
∂
∂
= ∂
∂
= ∂
⎟⎟ −
⎠
⎜⎜ ⎞
⎝
⎛
∂ + ∂
=
− +
=
| ) ( ) (
| ) (
2
2
0 0
2
Quadratic forms
• Quadratic form (homogeneous polynomial of degree two) of n variables xi
• Examples
=
Symmetric matrices
• Quadratic forms can be represented by a real symmetric matrix A where
Eigenvalues of symmetric matrices
Brad Osgood
Eigenvectors of symmetric matrices
( ) ( )
( ) ( ) Λ Λy y Λ y
y
x Q Λ x Q
x Q Λ Q x
Ax x
21 21 T T
T T T
T T
T
=
=
=
=
1 1q λ
2 2q λ
1 x
Tx =
= 1
z
z
TVisualize quadratic functions
T
A ⎥
⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
=⎡
⎥⎦
⎢ ⎤
⎣
=⎡
1 0
0 1 1 0
0 1 1 0
0 1 1 0
0 1
Visualize quadratic functions
T
A ⎥
⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
=⎡
⎥⎦
⎢ ⎤
⎣
=⎡
1 0
0 1 1 0
0 4 1 0
0 1 1 0
0 4
Visualize quadratic functions
T
A ⎥
⎦
⎢ ⎤
⎣
⎡
−
−
⎥ −
⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
⎡
−
−
= −
⎥⎦
⎢ ⎤
⎣
=⎡
50 . 0 87 . 0
87 . 0 50 . 0 4 0
0 1 50 . 0 87 . 0
87 . 0 50 . 0 75 . 1 30 . 1
30 . 1 25 . 3
Visualize quadratic functions
T
A ⎥
⎦
⎢ ⎤
⎣
⎡
−
−
⎥ −
⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
⎡
−
−
= −
⎥⎦
⎢ ⎤
⎣
=⎡
50 . 0 87 . 0
87 . 0 50 . 0 10 0
0 1 50 . 0 87 . 0
87 . 0 50 . 0 25 . 3 90 . 3
90 . 3 75 . 7
Harris corner detector
[ ]
( , ) , u
E u v u v M v
≅ ⎡ ⎤⎢ ⎥
⎣ ⎦
Intensity change in shifting window: eigenvalue analysis
λ1, λ2 – eigenvalues of M
direction of the slowest change direction of the
fastest change
(λmax)-1/2
(λmin)-1/2 Ellipse E(u,v) = const
Harris corner detector
λ1 λ2
Corner
λ1and λ2are large,
λ1 ~ λ2;
Eincreases in all directions
λ1and λ2are small;
Eis almost constant in all directions
edge λ1>> λ2 edge
λ2>> λ1
flat Classification of
image points using eigenvalues of M:
Harris corner detector
Measure of corner response:
( )
2det trace
R = M −k M
1 2
1 2
det trace
M M
λ λ λ λ
=
= +
(k – empirical constant, k = 0.04-0.06) 2
4 )
( 00 11 2 10 01
11
00 a a a a a
a + ± − +
λ =
Harris corner detector
Summary of Harris detector Now we know where features are
• But, how to match them?
• What is the descriptor for a feature? The
simplest solution is the intensities of its spatial neighbors. This might not be robust to
brightness change or small shift/rotation.
( )
Harris Detector: Some Properties
• Rotation invariance
Ellipse rotates but its shape (i.e. eigenvalues) remains the same
Corner response Ris invariant to image rotation
Harris Detector: Some Properties
• But: non-invariant to image scale!
All points will be
classified as edges Corner !
Scale invariant detection
• The problem: how do we choose corresponding circles independentlyin each image?
• Aperture problem
SIFT
(Scale Invariant Feature Transform)
SIFT
• SIFT is an carefully designed procedure with empirically determined parameters for the invariant and distinctive features.
SIFT stages:
• Scale-space extrema detection
• Keypoint localization
• Orientation assignment
• Keypoint descriptor
( )
local descriptor
detector descriptor
A 500x500 image gives about 2000 features
matching
1. Detection of scale-space extrema
• For scale invariance, search for stable features across all possible scales using a continuous function of scale, scale space.
• SIFT uses DoG filter for scale space because it is efficient and as stable as scale-normalized Laplacian of Gaussian.
DoG filtering
Convolution with a variable-scale Gaussian
Difference-of-Gaussian (DoG) filter
Convolution with the DoG filter
Scale space
σ doubles for the next octave
K=2(1/s)
Dividing into octave is for efficiency only.
Detection of scale-space extrema
Keypoint localization
X is selected if it is larger or smaller than all 26 neighbors
Decide scale sampling frequency
• It is impossible to sample the whole space, tradeoff efficiency with completeness.
• Decide the best sampling frequency by experimenting on 32 real image subject to synthetic transformations. (rotation, scaling, affine stretch, brightness and contrast change, adding noise…)
Decide scale sampling frequency
S=3, for larger s, too many unstable features
for detector, repeatability for descriptor,
distinctiveness
Decide scale sampling frequency
Pre-smoothing
σ =1.6, plus a double expansion
Scale invariance
2. Accurate keypoint localization
• Reject points with low contrast (flat) and poorly localized along an edge (edge)
• Fit a 3D quadratic function for sub-pixel maxima
1
6
5
-1 0 +1
2. Accurate keypoint localization
• Reject points with low contrast and poorly localized along an edge
• Fit a 3D quadratic function for sub-pixel maxima
1
6
5
-1 0 +1
2
2 ) 0 ( ) '' 0 ( ' ) 0 ( )
( f x
x f f x
f ≈ + +
3 ˆ=1 x
2
2 6 2 3
2 2 6 6 )
(x x x x x
f ≈ + +− = + −
0 6 2 ) (
' x = − x= f
3 61 3 3 1 3 2 1 6 ˆ) (
2
⎟ =
⎠
⎜ ⎞
⎝
⋅⎛
−
⋅ +
= x f 3
61
3 1
2. Accurate keypoint localization
• Taylor series of several variables
• Two variables
⎟⎟⎠
⎜⎜ ⎞
⎝
⎛
∂
∂ + ∂
∂
∂ + ∂
∂
∂ + ∂
⎟⎟⎠
⎜⎜ ⎞
⎝
⎛
∂ +∂
∂ + ∂
≈ 2 2 2 2 2 2
2 ) 1
0 , 0 ( ) ,
( y
y y xy f y x x f
x x y f
y x f x f f
y x f
[ ]
⎥⎦
⎢ ⎤
⎣
⎡
⎥⎥
⎥⎥
⎦
⎤
⎢⎢
⎢⎢
⎣
⎡
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
⎥+
⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
⎡
∂
∂
∂ + ∂
⎟⎟⎠
⎜⎜ ⎞
⎝
⎛ ⎥
⎦
⎢ ⎤
⎣
≈ ⎡
⎟⎟⎠
⎜⎜ ⎞
⎝
⎛ ⎥
⎦
⎢ ⎤
⎣
⎡
y x
y y
f y x
f
y x
f x x
f y y x
x y f x f f
y
f x 2 2
2 2
2 1 0
0
( ) ( ) x
x x x x
0
x 2
2
2 1
∂ + ∂
∂ +∂
≈ f f
f
f T
T
Accurate keypoint localization
• Taylor expansion in in matrix form, x is a vector, f maps x to a scalar
⎟⎟
⎟⎟
⎟⎟
⎟⎟
⎠
⎞
⎜⎜
⎜⎜
⎜⎜
⎜⎜
⎝
⎛
∂
∂
∂
∂∂
∂
xn
f x f x f
M
1 1
⎟⎟
⎟⎟
⎟⎟
⎟⎟
⎠
⎞
⎜⎜
⎜⎜
⎜⎜
⎜⎜
⎝
⎛
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
2 2
2 2
1 2
2 2
2 2 2
1 2 2
1 2
2 1
2
2 1 2
n n
n
n n
x f x
x f x
x f
x x
f x
f x
x f
x x
f x
x f x
f
L M O M
M
L L
Hessian matrix (often symmetric) gradient
2D illustration 2D example
-17 -1 -1
7 7 7
7
-9
-9
Derivation of matrix form
x g x)= T (
h
( )
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
=
n n
x x g
g L M
1 1
∑
== n
i
i ix g
1
x =g
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
=
⎟⎟
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜⎜
⎜
⎝
⎛
∂
∂
∂
∂
∂ =
∂
n n
g g
x h x
h
h M M
1 1
Derivation of matrix form
Ax x x)= T (
h
( )
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜
⎝
⎛
=
n nn n
n T
n
x x
a a
a a
x
x M
L M O M
L L
1
1
1 11
1
∑∑
= == n
i n
j
j i ijx x a
1 1
Ax x x A
T +
=
⎟⎟
⎟⎟
⎟⎟
⎠
⎞
⎜⎜
⎜⎜
⎜⎜
⎝
⎛
+ +
=
⎟⎟
⎟⎟
⎟
⎠
⎞
⎜⎜
⎜⎜
⎜
⎝
⎛
∂
∂
∂
∂
∂ =
∂
∑ ∑
∑ ∑
= =
= =
n
i
n
j
j nj i
in n
i
n
j
j j i
i
n
x a x
a
x a x
a
x h x
h h
1 1
1 1
1 1
1
M M
x A AT )
( +
=
Derivation of matrix form
f x x f
f f
f
h T T T
2 2 2
2 2 2
2 1
x x
x x
x
x ∂
+ ∂
∂
= ∂
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
∂ +∂
∂ + ∂
∂
= ∂
∂
∂
Accurate keypoint localization
• x is a 3-vector
• Change sample point if offset is larger than 0.5
• Throw out low contrast (<0.03)
Accurate keypoint localization
• Throw out low contrast |D(xˆ)|<0.03
x x
x x x x
x x x x
x
x x x x x x
x
x x x x x x
x
x x x x x
x
2 ˆ 1
ˆ) 2 (
ˆ 1 2 ˆ 1
2 ˆ 1
2 ˆ 1
ˆ 2ˆ
ˆ 1 ˆ)
(
1 2 2
1 2 2 2 2 2 2
1 2 2 2 1 2
2 2
2 2
T
T T
T T
T T T
T T
T T
D D
D D D
D D D D D
D D D D D D D
D D D D D D D
D D D
D
∂ + ∂
=
∂ − + ∂
∂ +∂
=
∂
∂
∂
∂
∂ + ∂
∂ +∂
=
∂
∂
∂
∂
∂
∂
∂
∂
∂ + ∂
∂ +∂
=
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
∂
∂
∂
−∂
∂
∂
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
∂
∂
∂
−∂
∂ + +∂
=
∂ + ∂
∂ +∂
=
−
−
−
−
−
Eliminating edge responses
r=10 Let
Keep the points with
Hessian matrix at keypoint location
Maxima in D Remove low contrast and edges
Keypoint detector
233x89 832 extrema
729 after con- trast filtering
536 after cur- vature filtering
3. Orientation assignment
• By assigning a consistent orientation, the
keypoint descriptor can be orientation invariant.
• For a keypoint, L is the Gaussian-smoothed image with the closest scale,
orientation histogram (36 bins) (Lx, Ly) m
θ
Orientation assignment Orientation assignment
Orientation assignment Orientation assignment
σ=1.5*scale of the keypoint
Orientation assignment Orientation assignment
Orientation assignment
accurate peak position is determined by fitting
Orientation assignment
0 2π
36-bin orientation histogram over 360°, weighted by m and 1.5*scale falloff Peak is the orientation
Local peak within 80% creates multiple orientations
About 15% has multiple orientations and they contribute a lot to stability
SIFT descriptor 4. Local image descriptor
• Thresholded image gradients are sampled over 16x16 array of locations in scale space
• Create array of orientation histograms (w.r.t. key orientation)
• 8 orientations x 4x4 histogram array = 128 dimensions
• Normalized, clip values larger than 0.2, renormalize
σ=0.5*width
Why 4x4x8? Sensitivity to affine change
Feature matching
• for a feature x, he found the closest feature x1 and the second closest feature x2. If the
distance ratio of d(x, x1) and d(x, x1) is smaller than 0.8, then it is accepted as a match.
SIFT flow
Maxima in D Remove low contrast
Remove edges SIFT descriptor
Estimated rotation
• Computed affine transformation from rotated image to original image:
0.7060 -0.7052 128.4230 0.7057 0.7100 -128.9491 0 0 1.0000
• Actual transformation from rotated image to original image:
0.7071 -0.7071 128.6934 0.7071 0.7071 -128.6934 0 0 1.0000
Applications
Recognition
SIFT Features
3D object recognition 3D object recognition
Office of the past
Video of desk Images from PDF
Track &
recognize
T T+1
Internal representation
Scene Graph
Desk Desk
…
> 5000 images
change in viewing angle
Image retrieval
22 correct matches
Image retrieval
…
> 5000 images change in viewing angle
+ scale change
Image retrieval
Robot location Robotics: Sony Aibo
SIFT is used for
¾ Recognizing charging station
¾ Communicating with visual cards
¾ Teaching object recognition
¾ soccer
Structure from Motion
• The SFM Problem
– Reconstruct scene geometry and camera motion from two or more images
Track
2D Features Estimate
3D Optimize
(Bundle Adjust) Fit Surfaces
SFM Pipeline
Structure from Motion
Poor mesh Good mesh
Augmented reality Automatic image stitching
Automatic image stitching Automatic image stitching
Automatic image stitching Automatic image stitching
SIFT extensions
PCA
PCA-SIFT
• Only change step 4
• Pre-compute an eigen-space for local gradient patches of size 41x41
• 2x39x39=3042 elements
• Only keep 20 components
• A more compact descriptor
GLOH (Gradient location-orientation histogram)
17 location bins 16 orientation bins
Analyze the 17x16=272-d
eigen-space, keep 128 components SIFT is still considered the best.
SIFT
Multi-Scale Oriented Patches
• Simpler than SIFT. Designed for image matching.
[Brown, Szeliski, Winder, CVPR’2005]
• Feature detector
– Multi-scale Harris corners
– Orientation from blurred gradient – Geometrically invariant to rotation
• Feature descriptor
– Bias/gain normalized sampling of local patch (8x8) – Photometrically invariant to affine changes in
intensity
Multi-Scale Harris corner detector
• Image stitching is mostly concerned with matching images that have the same scale, so sub-octave pyramid might not be necessary.
= 2 s
Multi-Scale Harris corner detector
smoother version of gradients
Corner detection function:
Pick local maxima of 3x3 and larger than 10
Keypoint detection function
Experiments show roughly the same performance.
Non-maximal suppression
• Restrict the maximal number of interest points, but also want them spatially well distributed
• Only retain maximums in a neighborhood of radius r.
• Sort them by strength, decreasing r from infinity until the number of keypoints (500) is satisfied.
Non-maximal suppression
Sub-pixel refinement Orientation assignment
• Orientation = blurred gradient
Descriptor Vector
• Rotation Invariant Frame
– Scale-space position (x, y, s) + orientation (θ)
MOPS descriptor vector
• 8x8 oriented patch sampled at 5 x scale. See TR for details.
• Sampled from with spacing=5
8 pixels
40 pixels
MOPS descriptor vector
• 8x8 oriented patch sampled at 5 x scale. See TR for details.
• Bias/gain normalisation: I’ = (I – μ)/σ
• Wavelet transform
8 pixels
40 pixels
Detections at multiple scales
Summary
• Multi-scale Harris corner detector
• Sub-pixel refinement
• Orientation assignment by gradients
• Blurred intensity patch as descriptor
Feature matching
• Exhaustive search
– for each feature in one image, look at all the other features in the other image(s)
• Hashing
– compute a short descriptor from each feature vector, or hash longer descriptors (randomly)
• Nearest neighbor techniques
– k-trees and their variants (Best Bin First)
Wavelet-based hashing
• Compute a short (3-vector) descriptor from an 8x8 patch using a Haar “wavelet”
• Quantize each value into 10 (overlapping) bins (103 total entries)
• [Brown, Szeliski, Winder, CVPR’2005]
Nearest neighbor techniques
• k-D tree and
• Best Bin First (BBF)
Indexing Without Invariants in 3D Object Recognition, Beis and Lowe, PAMI’99
Project #2 Image stitching
• Assigned: 3/27
• Checkpoint: 11:59pm 4/15
• Due: 11:59am 4/24
• Work in pairs
Reference software
• Autostitch
http://www.cs.ubc.ca/~mbrown/autostitch/autostitch.html
• Many others are available online.
Tips for taking pictures
• Common focal point
• Rotate your camera to increase vertical FOV
• Tripod
• Fixed exposure?
Bells & whistles
• Recognizing panorama
• Bundle adjustment
• Handle dynamic objects
• Better blending techniques
Artifacts
• Take your own pictures and generate a stitched image, be creative.
• http://www.cs.washington.edu/education/courses/cse590ss/01wi/projec ts/project1/students/allen/index.html
Submission
• You have to turn in your complete source, the executable, a html report and an artifact.
• Report page contains:
description of the project, what do you learn, algorithm, implementation details, results, bells and whistles…
• Artifacts must be made using your own program.
Reference
• Chris Harris, Mike Stephens, A Combined Corner and Edge Detector, 4th Alvey Vision Conference, 1988, pp147-151.
• David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60(2), 2004, pp91-110.
• Yan Ke, Rahul Sukthankar, PCA-SIFT: A More Distinctive Representation for Local Image Descriptors, CVPR 2004.
• Krystian Mikolajczyk, Cordelia Schmid, A performance evaluation of local descriptors, Submitted to PAMI, 2004.
• SIFT Keypoint Detector, David Lowe.
• Matlab SIFT Tutorial, University of Toronto.