M3.13
AFFINE MODELS FOR IMAGE MATCHING A N D MOTION DETECTION
Chiou-Sham Fuh
andPetros Maragos
Division of Applied Sciences, Harvard University, Cambridge, MA 02138, USA
ABSTRACT: A model is developed for detecting the displacement field in spatio-temporal image sequences that al- lows for affine shape deformations of corresponding spatial re- gions and for affine transformations of the image intensity range.
This model includes the block matching method as a special case.
A least-squares algorithm is used to find the model parameters.
It is experimentally demonstrated that the affine matching model performs better than other standard approaches. The resulting 2-D motion estimates are then used by a 3-D affine model and a least-squares algorithm that recover 3-D rigid body motion and depth from two perspective views.
1 Introduction
Motion detection is a very important problem both in video im- age coding and in computer vision. In video coding, motion detection is a necessary task for motion-compensated predictive coding and motion-adaptive frame interpolation to reduce the re- quired channel bandwidth. In computer vision systems, motion detection can be used to infer the 3-D motion and surface struc- ture of moving objects with many applications to robot guidance and remote sensing.
There is a vast literature on motion detection; see [1][7] for re- views. The major approaches to computing displacement vectors for corresponding pixels in two time-consecutive image frames can be classified as either using gradient-based methods (which include pixel-recursive algorithms) [2][4][5][8], or correspondence of motion tokens [3][9], or block matching methods [10][7].
Let I ( z , y, t ) be a spatio-temporal intensity image signal due to a moving object, where p = (z,y) is the (spatial) pixel vec- tor. A well-known method t o estimate 2-D velocities or pixel displacements on the image plane is the block matching method, where
E ( d ) = W P , t l ) - I(P
+
d , t2)I2 PERis minimized over a small spatial region
R
to find the optimum displacement vector d . Minimizing E ( d ) is closely related to find- ing d such that the correlationC p e ~
I ( p , tl ) I ( p+
d , t z ) is maxi- mized; thus, it is sometimes called the area correlation method.This approach has been negatively criticized because (i) it is computation-intensive; (ii) it ignores that the region R, which is the projection of the moving object at time t = 11, will cor- respond to another region R’ at t = 22 with deformed shape due t o foreshortening of the object surface regions as viewed a t
-
This research was supported by the National Science Foundation under Grant MIPS-86-58150 with matching funds from Bellcore, DEC, TASC, and Xerox, and in part by the Army Research Office under Grant DAAL03-86- K-0171 to the Center for Intelligent Control Systems.
two different time instances; (iii) the image signals correspond- ing to regions R and
R‘
do not only differ with respect to their supportsR
and R’, but also undergo amplitude transformations due t o the different lighting and viewing geometries at t l and1 2 . Nowadays, (i) is not critical any more due to the availability
of very fast hardware or parallel computers, but (ii) and (iii) are serious drawbacks. Several researchers have adopted other meth- ods that depend either on (a) constraints among spatio-temporal image gradients, or on (b) tracking features (e.g., edges, blobs).
However, (a) performs badly for medium- or long-range motion and is sensitive t o noise. (b) is more robust in noise and works for longer-range motion, but feature extraction and tracking is a difficult task and gives sparse motion estimates. By comparison, if problems (ii) and (iii) can be solved, then the block match- ing method has the advantages of more robustness over (a) and denser motion estimates over (b).
In this paper, we present an improved model for block match- ing that solves problems (ii) and (iii) by allowing R t o undergo affine shape deformations (as opposed to just translations that the block matching method assumes) and by allowing the inten- sity signal I to undergo affine amplitude transformations. The parameters for this affine model are found via a least-squares algorithm. Several experiments are reported that demonstrate the superiority of our affine model for image matching and mo- tion detection over gradient-based, feature-tracking, or standard block matching methods. Finally, we apply the previous results to recovering the 3-D rigid body motion parameters and depth from two perspective views by using a 3-D affine model whose input 2-D motion correspondences are the displacement vectors that resulted from our afine matching model.
2 Affine Model for Image Matching
We assume that the region R’ at 1 = 12 has resulted from the region
R
at t = tl via an @ne shape deformation p ++Mp +
d , whereThe vector d = (dz, d,) accounts for spatial translations, whereas the 2 x 2 real matrix
M
accounts for rotations and scalings (com- pressions or expansions). That is, s Z , s u are the scaling ratios in the 2, y directions, ande,,
0, are the corresponding rotation angles. These kinds of region deformations occur in a moving im- age sequence. For example, when objects rotate relative t o the camera, the region R also rotates. When objects move closer or farther from the camera, the region R gets scaled (expanded or compressed). Displacements by d can be caused by translations-
2409-
CH2977-719110000-2409 $1.00 0 1991 IEEE
Authorized licensed use limited to: National Taiwan University. Downloaded on December 19, 2009 at 05:01 from IEEE Xplore. Restrictions apply.
3 of 3-D
Authorized licensed use limited to: National Taiwan University. Downloaded on December 19, 2009 at 05:01 from IEEE Xplore. Restrictions apply.
1 1
1 under dim light sources (242 x 242 pixels, 8. ,it/pix&). (d) “Poster” (frame 2) with s m d rotation and under much brighter light sources. Displacement vectors between images l a and Id using (b) standard block matching and (c) the affine matching algorithm. (g) “Poster” (frame 3) after camera moved closer to the object. Displacement vectors between images Id and l g using (e) standard block matching and (f) the affine matching algorithm. (j) “Poster” (frame 4) after a 23’ counterclockwise rotation. Displacement vectors between images l g and l j using: (h) standard block matching, (i) the affine matching algorithm, (k) a feature-based correspondence algorithm [3], and
( e )
a gradient-based optical flow algorithm [4].-
2411-
Authorized licensed use limited to: National Taiwan University. Downloaded on December 19, 2009 at 05:01 from IEEE Xplore. Restrictions apply.
-
2412-
Authorized licensed use limited to: National Taiwan University. Downloaded on December 19, 2009 at 05:01 from IEEE Xplore. Restrictions apply.