- This research was supported by the National Science Foundation under Grant MIPS-86-58150

(1)

M3.13

AFFINE MODELS FOR IMAGE MATCHING A N D MOTION DETECTION

Chiou-Sham Fuh

and

Petros Maragos

Division of Applied Sciences, Harvard University, Cambridge, MA 02138, USA

ABSTRACT: A model is developed for detecting the displacement field in spatio-temporal image sequences that al- lows for affine shape deformations of corresponding spatial regions and for affine transformations of the image intensity range.

This model includes the block matching method as a special case.

A least-squares algorithm is used to find the model parameters.

It is experimentally demonstrated that the affine matching model performs better than other standard approaches. The resulting 2-D motion estimates are then used by a 3-D affine model and a least-squares algorithm that recover 3-D rigid body motion and depth from two perspective views.

1 Introduction

Motion detection is a very important problem both in video image coding and in computer vision. In video coding, motion detection is a necessary task for motion-compensated predictive coding and motion-adaptive frame interpolation to reduce the re- quired channel bandwidth. In computer vision systems, motion detection can be used to infer the 3-D motion and surface struc- ture of moving objects with many applications to robot guidance and remote sensing.

There is a vast literature on motion detection; see [1][7] for re- views. The major approaches to computing displacement vectors for corresponding pixels in two time-consecutive image frames can be classified as either using gradient-based methods (which include pixel-recursive algorithms) [2][4][5][8], or correspondence of motion tokens [3][9], or block matching methods [10][7].

Let I ( z , y, t ) be a spatio-temporal intensity image signal due to a moving object, where p = (z,y) is the (spatial) pixel vector. A well-known method t o estimate 2-D velocities or pixel displacements on the image plane is the block matching method, where

E ( d ) = W P , t l ) - I(P

+

d , t2)I2 PER

is minimized over a small spatial region

R

to find the optimum displacement vector d . Minimizing E ( d ) is closely related to find- ing d such that the correlation

C p e ~

I ( p , tl ) I ( p

+

d , t z ) is maxi- mized; thus, it is sometimes called the area correlation method.

This approach has been negatively criticized because (i) it is computation-intensive; (ii) it ignores that the region R, which is the projection of the moving object at time t = 11, will correspond to another region R’ at t = 22 with deformed shape due t o foreshortening of the object surface regions as viewed a t

-

This research was supported by the National Science Foundation under Grant MIPS-86-58150 with matching funds from Bellcore, DEC, TASC, and Xerox, and in part by the Army Research Office under Grant DAAL03-86- K-0171 to the Center for Intelligent Control Systems.

two different time instances; (iii) the image signals corresponding to regions R and

R‘

do not only differ with respect to their supports

R

and R’, but also undergo amplitude transformations due t o the different lighting and viewing geometries at t l and

1 2 . Nowadays, (i) is not critical any more due to the availability

of very fast hardware or parallel computers, but (ii) and (iii) are serious drawbacks. Several researchers have adopted other methods that depend either on (a) constraints among spatio-temporal image gradients, or on (b) tracking features (e.g., edges, blobs).

However, (a) performs badly for medium- or long-range motion and is sensitive t o noise. (b) is more robust in noise and works for longer-range motion, but feature extraction and tracking is a difficult task and gives sparse motion estimates. By comparison, if problems (ii) and (iii) can be solved, then the block matching method has the advantages of more robustness over (a) and denser motion estimates over (b).

In this paper, we present an improved model for block matching that solves problems (ii) and (iii) by allowing R t o undergo affine shape deformations (as opposed to just translations that the block matching method assumes) and by allowing the intensity signal I to undergo affine amplitude transformations. The parameters for this affine model are found via a least-squares algorithm. Several experiments are reported that demonstrate the superiority of our affine model for image matching and motion detection over gradient-based, feature-tracking, or standard block matching methods. Finally, we apply the previous results to recovering the 3-D rigid body motion parameters and depth from two perspective views by using a 3-D affine model whose input 2-D motion correspondences are the displacement vectors that resulted from our afine matching model.

2 Affine Model for Image Matching

We assume that the region R’ at 1 = 12 has resulted from the region

R

at t = tl via an @ne shape deformation p ++

Mp +

d , where

The vector d = (dz, d,) accounts for spatial translations, whereas the 2 x 2 real matrix

M

accounts for rotations and scalings (com- pressions or expansions). That is, s Z , s u are the scaling ratios in the ^2,y directions, and

e,,

0, are the corresponding rotation angles. These kinds of region deformations occur in a moving image sequence. For example, when objects rotate relative t o the camera, the region R also rotates. When objects move closer or farther from the camera, the region R gets scaled (expanded or compressed). Displacements by d can be caused by translations

-

2409

-

CH2977-719110000-2409 $1.00 0 1991 IEEE

Authorized licensed use limited to: National Taiwan University. Downloaded on December 19, 2009 at 05:01 from IEEE Xplore. Restrictions apply.

(2)

3 of 3-D

(3)

1 1

1 under dim light sources (242 x 242 pixels, 8. ,it/pix&). (d) “Poster” (frame 2) with s m d rotation and under much brighter light sources. Displacement vectors between images l a and Id using (b) standard block matching and (c) the affine matching algorithm. (g) “Poster” (frame 3) after camera moved closer to the object. Displacement vectors between images Id and l g using (e) standard block matching and (f) the affine matching algorithm. (j) “Poster” (frame 4) after a 23’ counterclockwise rotation. Displacement vectors between images l g and l j using: (h) standard block matching, (i) the affine matching algorithm, (k) a feature-based correspondence algorithm [3], and

( e )

a gradient-based optical flow algorithm [4].

-

2411

-

(4)

-

2412

-