• 沒有找到結果。

A neural fuzzy system for image motion estimation

N/A
N/A
Protected

Academic year: 2021

Share "A neural fuzzy system for image motion estimation"

Copied!
24
0
0

加載中.... (立即查看全文)

全文

(1)

A neural fuzzy system for image motion estimation

C.T. Lin

, I.F. Chung, L.K. Martin Sheu

Department of Electrical and Control Engineering, National Chiao-Tung University, 1001 Ta Hseuh Road, Hsinchu, Taiwan Received April 1998; received in revised form November 1998

Abstract

Many methods for computing optical ow (image motion vector) have been proposed while others continue to appear. Block-matching methods are widely used because of their simplicity and easy implementation. The motion vector is uniquely de ned, in block-matching methods, by the best t of a small reference subblock from a previous image frame in a larger, search region from the present image frame. Hence, this method is very sensitive to the real environments (involving occlusion, specularity, shadowing, transparency, etc.). In this paper, a neural fuzzy system with robust characteristics and learning ability is incorporated with the block-matching method to make a system adaptive for di erent circumstances. In the neural fuzzy motion estimation system, each subblock in the search region is assigned a similarity membership contributing di erent degrees to the motion vector. This system is more reliable, robust, and accurate in motion estimation than many other methods including Horn and Schunck’s optical ow, fuzzy logic motion estimator (FME), best block matching, NR, and fast block matching. Since fast block-matching algorithms can be used to reduce search time, a three-step fast search method is employed to nd the motion vector in our system. However, the candidate motion vector is often trapped by the local minimum, which makes the motion vector undesirable. An improved three-step fast search method is tested to reduce the e ect from local minimum and some comparisons about fast search algorithms are made. In addition, a Quarter Compensation Algorithm for compensating the interframe image to tackle the problem that the motion vector is not an integer but rather a oating point is proposed. Since our system can give the accurate motion vector, we may use the motion information in many di erent applications such as motion compensation, CCD camera auto-focusing or zooming, moving object extraction, etc. Two application examples will be illustrated in this paper. c 2000 Elsevier

Science B.V. All rights reserved.

Keywords: Optical ow; Motion vector; Block matching; Membership function; Backpropagation; Ane motion

1. Introduction

The measurement of optical ow (or image motion vector) is a fundamental problem in the processing of image sequences. The goal is to compute an approximation to the 2-D motion eld – a projection of the 3-D velocities of surface points onto the imaging surface. Several methods exist for computing optical ow: block-matching methods, di erential techniques, etc. [3]. Regretably, however, these systems are usually

Corresponding author.

0165-0114/00/$ - see front matter c 2000 Elsevier Science B.V. All rights reserved. PII: S 0165-0114(99)00075-5

(2)

sensitive to noise, do not easily converge to the desired motion estimation, and have poor accuracy. This paper introduces neural fuzzy system to attack these problems.

Many methods for computing optical ow have been proposed and classi ed by Barron [3]. Di eren-tial techniques, phase-based and energy-based methods, and region-based matching techniques are the most widely used means for computing the 2-D motion eld. Di erential techniques compute velocity from spa-tiotemporal derivatives of image intensity or ltered versions of the image (using low-pass or band-pass lters) [8,16,26,27,30,31]. One requirement of di erential techniques is that I(x; t) (image intensity at pixel location x and time t) must be di erentiable. This implies that temporal smoothing at the sensor is needed to avoid aliasing and that numerical di erentiation must be done carefully. Another problem of di erential techniques is that, if more accurate results from the spatiotemporal patterns of image intensity are desired, then more than two spatiotemporal patterns are required. Regretably, having more than two patterns occupies much memory and takes much time. Image velocity is de ned in terms of the phase behavior of band-pass lter outputs in phase-based methods. The zero-crossing techniques [6] are classi ed as phase-based methods because zero crossing can be viewed as level phase crossing. Energy-based techniques based on the output energy of velocity-tuned lters are also called frequency-based methods owing to the design of velocity-tuned lters in Fourier domain [9]. Region-based matching methods attempt to nd the best t of a small reference subblock from a previous image frame in a larger, search region from the present image frame [10,18,19]. More clearly, an image frame is segmented into two-dimensional small blocks of N × N pixels. Each block searches for the displacement which produces the best match among the candidate blocks in the present frame. Fast block-matching algorithms can be used to reduce search time [18,19]. However, the fast search algorithms are easily trapped by the local minimum and thus result in considerable error.

Because of noise or aliasing in the image acquisition process, nding a perfect match is impossible. Since fuzzy logic gives greater generality, higher expressive power, an enhanced ability to model real-world problems and, most importantly, a methodology for exploiting the tolerance for imprecision, it can be applied for image motion estimation. The fuzzy logic motion estimator (FME) proposed by Lipp [23] is a method based on block-matching algorithm linked with fuzzy logic. In the FME, for each subblock comparison, the fuzzy logic system classi es an (M × M) subblock in the search region as to its potential membership in the subblock being matched using fuzzy membership function (FMF). The FME has shown more accurate motion vector estimates with uniform and ane modeled frame-to-frame image motion of real-image data than those from Horn and Schunk’s optical ow algorithm [16] or Netravali and Robbins’ pel recursive algorithm [28]. In the FME system, however, the generation of membership for each subblock comparison is quite subjective; this subjectivity in turn harms the accuracy of the motion estimation even if the FME technique has better system performance than conventional methods do.

This paper integrates the conventional region-based matching method and a ve-layered neural fuzzy network into a system to estimate image motion. The backpropagation (BP) learning rule is used for choosing proper membership functions in order to make this system adapt to di erent environments. Since the fast block-matching algorithm is a good choice for reducing search time, it was adopted in this system. However, since the local minimum problem appears in the searching process of the conventional fast block-matching algorithms and results in considerable error for the motion estimation, an improved fast block-matching method is proposed to lessen the local minimum problem. Some comparisons of fast search algorithms are made. This system is more reliable and accurate for image motion estimation when compared with other methods including Horn and Schunck’s optical ow algorithm, best block-matching, Netravali and Robbins’ (NR) pel recursive algorithm, FME, and three-step fast block-matching algorithm. As a summary, the signi cant characteristics of the proposed system include that it is more robust than several previous methods; it can estimate motion vector more accurately than several other methods; the system parameters are tunable through neural learning; each subblock in the search region will be assigned similarity membership contributing di erent degrees to the motion estimation; and it adopts a modi ed fast block-matching method which can reduce local minimum problem.

(3)

The architecture and learning algorithm of the proposed system are described in Section 2. Some perfor-mance comparisons are made in Section 3. In Section 4, the applications of the proposed system are illustrated. Conclusions are summarized in Section 5.

2. The architecture of the neural fuzzy motion estimator

Fig. 1 shows the system architecture of the proposed neural fuzzy motion estimator. It is a ve-layered network structure. Fig. 2 is the functional block diagram of this system. First, two image sequences containing moving objects were processed by a region-based matching technique attempting to nd a best t of a small reference subblock from a previous image frame in a larger search region from the present image frame. Second, the outputs from region-based matching were fed into a neural fuzzy motion estimator which computed the motion eld of the image sequences.

Nodes at layer one were input units representing inputs in linguistic variables. Nodes at layer two were input term nodes which acted as membership functions for performing the similarity measure of each subblock in the search region of the present image frame. Layer 3 represented the rule nodes. Layer 4 contained output term nodes and Layer 5 contained two output nodes producing the crisp image velocity value in the x1 and x2

directions, respectively. In the proposed system, the motion vector was not decided merely by the best t of the reference subblock from a previous image frame in the larger search region of the present image frame. For each subblock matching, a fuzzy membership function (FMF) will give potential membership to each subblock in the search region. A potential membership represents the con dence measure of motion vector. This was performed over all subblocks in the search region. The matching degrees then passed through fuzzy reasoning process (Layers 3 and 4) and the defuzzi cation process (Layer 5) to obtain two crisp outputs; these outputs indicate the motion vector of the reference subblock in the search region. To train (or, more precisely, to calibrate) the proposed network, a known, uniformly displaced and=or ane modeled frame-to-frame image was used to get the desired motion vectors for learning the proper parameters of the FMFs. Moreover, a modi ed fast search scheme was used in this present system. In both learning or processing, the fast search scheme increased the learning rate or reduced the search time if it could nd the global minimum of the matching criterion correctly in the region-based matching algorithm. Each part of this system will now be described in greater detail.

2.1. Block-matching Algorithm

A block-matching algorithm attempts to nd a best t of a small reference subblock from a previous image frame in a larger search region from the present image frame as depicted in Fig. 3. As the gure shows, this algorithm compares the (M × M) reference subblock with each of those having the same-sized portions within the (N × N) search region. In general, the size of the search region is larger than the reference’s but much smaller than the image frame containing it. There are many choices for determining the best match, e.g., the cross-correlation function; mean square error (MSE); mean absolute di erence (MAD) [5], etc. If one chooses the mean square error (MSE) as the matching criterion, one has

MSE(d1; d2) =M12 M X i=1 M X j=1 [Is(i + d1; j + d2) − Ir(i; j)]2; (1)

where Ir is the reference subblock in the previous image frame, Is is the search region in the present image

frame, N × N is the size of search region, and M × M is the size of reference subblock. The displacement (d1; d2) which minimizes MSE(d1; d2) is selected as the motion vector.

(4)

Fig. 1. System architecture of the proposed neural fuzzy motion estimator (NFME).

Layer 1 (Input layer): The nodes in this layer just transmit input values to the next layer directly. That is,

f = u(1)i and a = f; (2)

where u(1)i = MSE(d1; d2) and i means the ith compared subblock corresponding to displacement (d1; d2) in

the search region.

Layer 2 (Input term nodes): A single node was used to act as a membership function and a bell-shaped Gaussian function was selected as the input membership function,

B(x; m) = e−1=2·(x−m)2=2

; (3)

where x is the input to the node from Layer 1, and m and  are the center and width of the bell-shaped function, respectively. In this layer, the center of the bell-shaped function was set to zero and the width was

(5)

Fig. 2. The functional diagram of the proposed neural fuzzy system for image motion estimation.

Fig. 3. Block-matching algorithm.

tuned in the learning procedure to t di erent environments. Hence, we have f = −1

2·

MSE(d1; d2)

2 and a ≡ member(d1; d2) = ef; (4)

where member(d1; d2) is the membership value representing the con dence measure of each subblock which

displaces d1 and d2 pixels in horizontal and vertical axes, respectively. If member(d1; d2) is close to 1, then

the subblock’s motion vector probably equals (d1; d2); if member(d1; d2) is close to 0, then the subblock

probably does not have a motion vector equal to (d1; d2).

Layer 3 (Rule nodes): The fuzzy inference rules for generating fuzzy outputs are of this form: If member(d1; d2) then yx1= D1 and yx2= D2;

where member(d1; d2) is given in Eq. (4), Di is the fuzzy number of di (e.g., if d1= 1, then D1 is the fuzzy

number “1”); yx1 and yx2 are the output linguistic variables representing the displacement (in

pixels=time-step) in the horizontal and vertical axes, respectively. The output membership functions de ned as uniformly spaced Gaussian functions with overlap were used in this present system. The membership functions for

(6)

Fig. 4. The output membership functions.

output linguistic variable, yx1, are depicted in Fig. 4. The membership functions for yx2 are exactly the same.

From the above fuzzy rule, there is only one input for each rule node in Layer 3 (see Fig. 1). Hence, the (rule) node in this layer just transmits the input values to the next layer directly. That is,

f = u(3)i and a = f: (5)

In total, there are (N − M)2 rules for all possible displacements (disp

1, disp2) of the reference subblocks in

the search region. In other words, there are (N − M)2 rule nodes in Layer 3.

Layer 4 (Output term nodes): The links at layer four should perform the fuzzy OR operation to integrate the red rules which have the same consequence. Hence, we have

f =Xmember(d1; d2) and a = f (6)

and the link weight is w(4)i = 1.

Layer 5 (Output nodes): The node at this layer performs the defuzzi cation process. The following function can be used to approximate the COA defuzzi cation method:

f =Xw(5)ij u(5)i and a =Pf

u(5)i ; (7)

where wij(5) is the link weight assigned as displacement value di.

2.2. Learning unit

Gaussian function is used as the fuzzy membership function (FMF) in Layer 2 (see Figs. 1 and 2) with parameter  to be determined adaptively for various environments. The backpropagation (BP) learning rule is used to nd the parameter’s proper value. Consequently, training data indicating desired motion vectors are required for learning. The training data are obtained from a real-image synthetically displaced with ane motion, which is de ned by

 d1 d2  =  1 − cos(Âr) sin(Âr) − sin(Âr) 1 − cos(Âr)   x1 x2  +  u1 u2  ; (8)

where [d1; d2]T is the motion vector of the synthetic image, Âr is the clockwise angle of rotation, and u1; u2

are the uniform components of the motion in the x1 (vertical), x2 (horizontal) directions, respectively. The

detailed derivation of the learning algorithm is shown below. The output error is de ned as

E =1

2[yx1− ˆyx1]

2+1

2[yx2− ˆyx2]

(7)

where yx1; yx2 are the desired outputs in the x1 and x2 directions, respectively, and ˆyx1; ˆyx2are the corresponding

actual outputs, which are given by ˆyx1= P i P jmiwaij P i P jwaij ; (10) ˆyx2= P i P jmjwaij P i P jwaij ; (11) aij= e−(1=2)xij2=2ij; (12) x2 ij= MSE(i; j); (13)

where MSE(i; j) is the mean square error for determining the matching criterion de ned in Eq. (1), w is the width of output membership function, and mi; mj are the centers of output membership functions in the x1

and x2 directions, respectively. The backpropagation learning rule then gives

0 ij= ij+ ij; (14) ij= − Á@@E ij (15) = − Á@E@ ˆy @@ ˆy ij (16) = − Á  @E @ ˆyx1 @ ˆyx1 @ij + @E @ ˆyx2 @ ˆyx2 @ij  (17) = 1 ij+ 2ij; (18) where 1

ij= − Á@E=@ ˆyx1· @ ˆyx1=@ij and 2ij= − Á@E=@ ˆyx2· @ ˆyx2=@ij which can be calculated as follows:

1. @ ˆy@E x1 @E @ ˆyx1= − [yx1− ˆyx1]; (19) 2. @ ˆyx1 @ij @ ˆyx1 @ij = @ ˆyx1 @aij · @aij @ij = [yx1− ˆyx2] M X i=1 M X j=1    P i P jmi  ·PiPjaijPMi=1PMj=1miaij PM i=1 PM j=1aij 2 × e−(1=2)x2 ij=2ij×x 3 ij 3 ij    : (20)

In Eq. (20), the termPiPjmi= 0, because the output membership functions are chosen as Gaussian functions

symmetrical with respect to the origin. Hence, 1

ij becomes 1 ij= Á[yx1− ˆyx1] M X i=1 M X j=1  − PM i=1 PM j=1miaij PM i=1 PM j=1aij 2 × e−(1=2)x2 ij=ij2×x 3 ij 3 ij   : (21)

(8)

Similarly, 2 ij can be derived as 2 ij= Á[yx2− ˆyx2] M X i=1 M X j=1  − PM i=1 PM j=1mjaij PM i=1 PM j=1aij 2 × e−(1=2)x2 ij=2ij×x 3 ij 3 ij   : (22)

The nal update rule is

ij= 1ij+ 2ij: (23)

Eq. (23) is an update rule for the width of FMFs. Since objects can move arbitrarily in image sequences, it is reasonable to let all the FMFs have the same width, . The common width parameter, , is updated by the average value of all ij for each epoch of learning.

2.3. Fast search algorithm

In the proposed neural fuzzy motion estimator, the motion vectors in image sequences are determined by all the con dence measures of candidate locations in the search region, and especially by those locations with greater membership values. The greatest membership will fall in the location whose MSE value is the global minimum in the search region. Moreover, it is reasonable to assume that the signi cant memberships are at the locations around that with the greatest membership. Hence, when the global minimum of MSE is found in the search region, we can use 3 × 3; 5 × 5; : : : MSEs centered at the global minimum of MSE to calculate the approximate motion vector, which will be close to the desired motion vector. If the global minimum of MSE can be found without full search, the search time can be reduced and the learning rate can be increased. Hence, a modi ed fast search algorithm is incorporated into the learning unit of this present system to speed up both the learning and processing.

To reduce the heavy computational cost resulting from the massive number of candidate locations, the three-step fast block search algorithm [19] searches for the best motion vector in a coarse-to- ne manner. Fig. 5 illustrates the procedure of the three-step search with an example of motion vector (−7; 5). In the rst step, nine sparsely located candidates are evaluated and the one with a minimal MSE is picked out. In the second step, the search focuses on the area centered at the winner of the previous step, but distances between candidate locations are shortened by one-half. In the same manner, the third step compares the MSE’s of the nine locations around the winner found in the second step and then gives the nal motion vector. The three-step search algorithm commonly uses the range of d1= d2= 7. In this manner, the number of search

locations will decrease to 1=9 of the number of the full search approach.

In practice, many fast block-matching algorithms are trapped by the local minimum, the three-step search algorithm included. This paper thus proposes a modi ed fast search algorithm. In the second and third steps, coarse-to- ne search is replaced by full search to reduce the extent to which that searching process is trapped by local minimum. With this modi cation, the number of search locations will decrease to 1=5 of the number of the exhaustive search. Some comparisons of the full search, the original 3-step search, and the modi ed 3-step search are made in the following.

2.4. Experimental results of learning

The target motion vectors of the image sequences from the ane motion equation (Eq. (8)) can be found for updating the parameter, , of the fuzzy membership functions. The update rule is given by Eq. (23). Three cases are considered to show the learning ability of the proposed neural fuzzy motion estimator:

• Learning an image with di erent motion parameters: The training data are made using a real-image

(9)

Fig. 5. Three step fast block search algorithm.

Fig. 6. (a), (b) Two di erent images for learning. (c), (d) Two rotated images for learning.

di erent uniform translation components (u1; u2), (2; 2); (− 2; 2); (2; − 2); (− 2; − 2) are used to produce

training patterns representing the general motion of moving objects when they move arbitrarily in the image.

• Learning two di erent images: replacing the above image with a new one, redoing the same experiment,

and observing the value of .

• Learning an image using di erent numbers of rules: Di erent numbers of rules, 3 × 3; (1 × 1 is the best

blockmatching), 5 × 5; 7 × 7, 9 × 9, are used for an image to see the value of .

Two di erent images and two rotated ones for learning are shown in Fig. 6. Some training patterns repre-sented as the motion eld ow map are depicted in Fig. 7. The nal values of  updated by di erent training patterns made by ane motion from two di erent images are listed in Table 1.

The average value of  is obtained by averaging all the values learned from di erent training patterns, and this average will serve as the width parameter of input membership functions. Two average values of  for

(10)

Fig. 7. The desired motion vectors. (a) (u1; u2) = (2; 2); Âr= 4. (b) (u1; u2) = (2; 2); Âr= 8. (c) (u1; u2) = (2; 2); Âr= −4◦. (d)

(u1; u2) = (2; 2); Âr= −8◦.

two di erent images are observed in Table 1, indicating that the parameter of fuzzy membership function must be chosen carefully. Hence, one should tune the parameters of input membership functions to nd a proper value for tting di erent circumstances in practical applications.

As observed from Table 2, di erent numbers of rules result in di erent values of . That is, if one wants to produce crisp values of motion estimation using di erent number of rules, then the value of  must be chosen properly.

2.5. Comparisons of various fast block-matching algorithms

The performance of fast search algorithms is a ected by local minima. Hence, an algorithm with a bigger search range than the original three-step search algorithm was proposed to reduce the error. Using the modi ed

(11)

Table 2

The learned values of the width parameter  when learning using di erent number of rules

Rule no. Image 1 Image 2

3 × 3 5.341 5.92

5 × 5 7.597 6.01

7 × 7 19.91 6.61

3 × 3 19.85 8.43

three-step search algorithm described previously, the number of search locations increased from 1=9 to 1=5 of the number of the exhaustive approach, compared with the original 3-step search algorithm. Two synthetically uniform translated images in Fig. 6 were used to examine the performance of the fast search and improve fast search algorithms. The full search algorithm may be regarded as having the perfect performance (i.e., 0% error rate) in the searching process, then the error percentage of three-step and modi ed three-step methods with respect to the full search algorithm was obtained. The original three-step search scheme had a 54.32% error rate, while the modi ed method with bigger search range had about 20.26% error. The local minimum problem still existed in the improved method. Hence, the learning rate could be increased e ectively if the fast search algorithm worked well in the search process.

3. Test and comparisons of synthetic image sequences

This section examines the performance of the proposed and some existing image motion estimation tech-niques on synthetic image sequences for which 2-D motion elds have been known. Before discussing the experimental results, however, it is essential to describe the image sequences used for comparison and the measures of error.

3.1. Synthetic image sequences

The main advantage of using synthetic inputs is that the 2-D motion elds and scene properties can be controlled and tested in a methodical way. One may use the known motion vectors to quantify the performance of a speci c algorithm. On the other hand, it must be kept in mind that such inputs are usually clean signals (involving no specularity, shadowing, transparency, occlusion, etc.) and therefore this measure of performance should be taken as an optimistic bound on the expected errors with real-image sequences. The synthetic images are made by means of ane motion (shown in Eq. (8)), and the synthetic image sequences include

• Translation sequences: Results of the case with velocity v = (− 3; 2) are reported (see Fig. 8). Hence, the

desired motion vectors will fall in location (− 3; 2) of the 2-D plane if one represents the motion vectors using Cartesian coordinate system. The better the motion estimation is, the closer the estimated motion vectors will be to the point (− 3; 2). This phenomenon can be observed from the scatter map of the motion vectors.

• Rotation sequences: For this experiment, the image is rotated 6◦ clockwise using ane modeled motion to produce the present image frame (see Fig. 8).

3.2. Image motion estimation techniques for comparisons

Synthetic image sequences of moving brightness patterns as mentioned above have been processed by many di erent methods including Horn and Schunck’s, FME, Netravali and Robbins’ pel recursive algorithm, best- t block-matching, and this paper’s neural fuzzy motion estimator. These methods are described as follows.

(12)

Fig. 8. Testing images. (a) Original image. (b) Image after translation, v = (− 3; 2). (c) Image after rotation, Âr= 6clockwise.

• Horn and Schunck (HS) [16]: Horn and Schunck [16] combined the gradient constraint with a global

smoothness term to constrain the estimated velocity eld C = (u(x; t); v(x; t)) in minimizing the error equa-tion. The gradient constraint is

5 I(x; t) · v + It(x; t) = 0; (24)

where It(x; t) denotes the partial time derivative of I(x; t), 5I · v denotes the usual dot product, and

5I(x; t) = (Ix(x; t); Iy(x; t))T. The total error to be minimized is

Z

D[(5I · v + It)

2+ 2(|| 5 u||2

2+ || 5 v||22)] dx (25)

de ned over a domain D, where the magnitude of re ects the in uence of the smoothness term. This study used = 50 because it produced better results in most test cases. Iterative equations are used to minimize Eq. (25) to obtain motion vectors:

uk+1= ukIx[Ixu k+ Iyvk+ It] 2+ I2 x + Iy2 ; (26) vk+1= vkIy[Ixu k+ Iyvk+ It] 2+ I2 x + Iy2 ; (27)

where k denotes the iteration number, u0 and v0 denote initial velocity estimates (here set to zero), and

uk; vk denote neighborhood averages of uk and vk. At least 100 iterations were used in this present study

to obtain better results. The method with spatiotemporal aliasing was implemented and the subsequent derivative estimates [16] were also improved.

• Fuzzy logic motion estimator (FME) (Lipp [23]): Based on the block-matching algorithm, this technique

calculates the displaced frame distortion (DFD) using DFD(d1; d2) = M X i=1 M X j=1 [Is(i + d1; j + d2) − Ir(i; j)]2; (28)

where (d1; d2) is the motion vector, M is the reference block size, Ir(i; j) is the reference subblock in

the previous image frame and Is(i; j) is the search region in the present image frame (with origin

(13)

a normalization factor in de ning the following membership value, which plays the same role as that in Eq. (4): member(d1; d2) =      1 − 5DFD(dDFD(0; 0)1; d2) if 5DFD(dDFD(0; 0)1; d2) ¡ 1; 0 otherwise: (29) In defuzzi cation, uniformly spaced triangular membership functions with little overlap (similar to those shown in Fig. 4) are used for the output membership functions. The outputs are combined using fuzzy centroid defuzzi cation to produce sharp outputs for directions x1 (vertical) and x2 (horizontal).

• Netravali and Robbins’ Pel recursive algorithm (NR) [28]: This motion estimation method attempts to

minimize recursively a certain quantity (function of the motion estimation error). If the changes in successive image frames are due to translation of an object, then the algorithm iterates in a gradient or steepest descent direction such that the consecutive estimates converge to an estimate of translation. It is noted that since the NR approach considers only the point-to-point displacement, it relates to image translation only rather than to rotation. Assume I(x; t) and I(x; t − ) are the intensity values of the two successive frames as a function of spatial location x (a two-dimensional vector) at time t. The time between the two frames is . If an object moves in translation, then in the moving area one has

I(x; t) = I(x − D; t − ); (30)

where D is the translation vector of the object during the time interval [t − ; t]. The frame di erence at spatial position x is given by

FDIF(x) = I(x; t) − I(x; t − ) (31)

= I(x; t) − I(x + D; t): (32)

In the recursive estimation scheme, the displaced frame di erence DFD(x; ˆDi−1) analogous to FDIF(x) is de ned by

DFD(x; ˆDi−1) = I(x; t) − I(x − ˆDi−1; t − ); (33)

ˆ

Di= ˆDi−1+ Ui; (34)

where ˆDi is the ith displacement estimate, ˆDi−1 is an initial estimate of ˆDi, and Ui is the update of ˆDi−1

for making it more accurate (i.e., the estimate of D − ˆDi−1). Then the estimator can be derived as ˆ

Di= ˆDi−1− DFD(xa; ˆDi−1) 5 I(xa− [ ˆDi−1]; t − ); (35)

where 5 is the gradient with respect to x,  is a positive scalar constant, and a pixel at location xa is

predicted with displacement ˆDi−1.

• BF (Best- t block-matching): This technique is widely used for the motion estimation because of its

simplicity and coding eciency for motion information. A detailed description of this technique is in Section 2.

• Neural fuzzy motion estimator (NFME): The authors propose this new technique introduced in Section 2 for

image motion estimation. This system can learn proper system parameters for di erent circumstances and has been tuned using various training patterns obtained from ane motion equation (Eq. (8)). The rotation angles Âr, 4, 8◦, −4◦, −8◦ and the uniform translation components (u1; u2)T; (2; 2); (− 2; 2); (2; − 2);

(14)

the value  = 6:067 was obtained as the parameter of input fuzzy membership functions. Although the synthetical images for comparisons were displaced uniformly (− 3; 2) and rotated 6◦ clockwise, which di ered from training patterns, one will see the more accurate motion estimation resulted from this system in the following comparisons. The comparisons also show that the proposed system is more robust than the others.

3.3. Comparison criteria and results

Some criteria must be de ned to compare the performance of di erent methods. The ve criteria used here are the root-mean-square velocity di erence in the x1-direction (err1), the root-mean-square velocity di erence

in the x2-direction (err2), the average angle di erence (Âerr) of motion vector, the root-mean-square error of

velocity (RMSE), and the average motion vector ( Vx1; Vx2) in translation test. These ve criteria are de ned

as follows: err1 = r 1 n1· n2 X X [DVx1− Vx1]2; (36) err2 = r 1 n1· n2 X X [DVx2− Vx2]2; (37) Âerr=n 1 1· n2 X X tanDVx2 DVx1 − tanVx2 Vx1 ; (38) RMSE = s 1 n1· n2 X X q DVx12+ DVx22 q Vx12+ Vx22 2 ; (39) ( Vx1; Vx2) = 1 n1· n2 X i X j Vx1(i; j); 1 n1· n2 X i X j Vx2(i; j) ! ; (40)

where DVx1 and DVx2 are desired motion vectors, Vx1 and Vx2 are actual motion vectors, and n1× n2

is the size of an image frame. Three types of performance representation are used: (i) motion vector ow map – represents motion vector at every location (x1; x2) by a small arrow whose length and direction are

proportional to the motion vector’s magnitude and angle; (ii) motion vector scatter map – represents motion vectors on the Cartesian coordinate plane; and (iii) error table – lists of all the errors in Eqs. (36)–(40) for comparison.

Fig. 9(a) and (b) show the ow map and scatter map, respectively, of the desired motion vectors when the image was uniformly displaced by − 3 pixels in the vertical direction and 2 pixels in the horizontal direction. Fig. 9(c) is the ow map computed by the HS method, and Fig. 9(d) shows its scatter map. The motion estimation had err1 = 0:2283 pixels, err2 = 0:7839 pixels, Âerr= 20:095 degrees, RMSE = 0:7191, and

( Vx1; Vx2) = (− 1:43; 1:35) as shown in Table 3. Figs. 9(e) and (f) indicate the motion estimation computed from

the FME method, which had err1 = 0:3952 pixels, err2 = 0:6802 pixels, Âerr= 12:448 degrees, RMSE = 0:7421,

and ( Vx1; Vx2) = (− 2:51; 2:02). Figs. 10(g) and (h) are the ow map and scatter map representing the motion

estimation of the NR recursive algorithm, which had err1 = 0:5009 pixels, err2 = 0:3555 pixels, Âerr= 6:258

degrees, RMSE = 0:6877, and ( Vx1; Vx2) = (− 2:91; 2:35). These errors are listed in Table 3. Figs. 10(i) and

(j) show the motion estimation resulting from the BF method. The corresponding errors are listed in Table 3. Fig. 10(j) and Table 3 indicate that the motion estimation computed from the BF method was error-free in the translation test. This awlessness is due to the fact that the signal made from an integer-translated image is clean (with no shadowing, specularity, transparency, occlusion, etc.); consequently, BF nds a motion

(15)

Fig. 9. Translation test. (a) Desired ow map. (b) Desired scatter map. (c) HS ow map. (d) HS scatter map. (e) FME ow map. (f) FME scatter map.

vector which is a perfect match in the search region from the present image. As will be see later, BF showed its shortcoming when it could not nd a perfect match in the rotation test. Figs. 10(k) and (l) are the ow map and scatter map representing the motion estimation of the NFME method, which had err1 = 0:0044 pixels, err2 = 0:0042 pixels, Âerr= 0:065 degrees, RMSE = 0:0085, and ( Vx1; Vx2) = (− 3; 1:99). This result was very

close to the desired one.

Fig. 11(a) is the ow map representing the desired motion vector when the image is rotated 6 clockwise. Figs. 11(b)–(f) are the estimated ow maps using various motion estimation methods. The corresponding error information is shown in Table 3. As observed from Table 3, the NFME method performs the best among all the compared methods. It was also found that the motion estimation obtained from the BF method was not error-free any more: this was because the perfect-match motion vectors did not always exist when pixels moved in the rotation style. From the comparison results shown as ow maps, scatter maps, and an

(16)

Table 3 Error table

Translation (− 3; 2) Rotation 6

err1 err2 Âerr RMSE ( Vx; Vy) err1 err2 Âerr RMSE

HS 0.2283 0.7839 20.095 0.7191 (−1:43; 1:35) 0.8136 1.0729 31.459 1.5315 FME 0.3592 0.6802 12.448 0.7421 (− 2:51; 2:02) 0.8849 0.8789 34.703 1.0837 BF 0 0 0 0 (− 3:2) 0.3251 0.3805 17.556 0.4155 NR 0.5009 0.3555 6.258 0.6877 (− 2:91; 2:35) 2.7433 3.5037 63.68 1.9491 NFME 0.0044 0.0042 0.065 0.0085 (− 3; 1:99) 0.2833 0.3152 12.648 0.3597 3-step 0.9648 0.7344 18.652 1.1971 (− 2:59; 1:85) 0.8094 1.2068 33.657 1.2785 Modi ed 0.5926 0.5360 12.346 0.7539 (− 2:53; 1:48) 0.6646 0.8984 27.100 0.9192 3-step

Fig. 10. Translation test (continued). (g) NR ow map. (h) NR scatter map. (i) BF ow map. (j) BF scatter map. (k) NFME ow map. (l) NFME scatter map.

(17)

Fig. 11. Rotation test. (a) Desired ow map. (b) HS ow map. (c) FME ow map. (d) NR ow map. (e) BF ow map. (f) NFME ow map.

error table in Figs. 9–11, and Table 3, one can infer that the proposed NFME method is very robust and accurate in both the translation and rotation tests.

Performance comparisons of the three-step fast search and modi ed three-step fast search algorithms are shown in Table 3 and Figs. 12(a)–(f). A modi ed three-step algorithm could reduce the error from the local minimum. The number of search locations were 1=5 of those from the exhaustive search approach.

Ane motion with a large rotation angle, Âr= 12, was also tested in this study. It is noted that this rotation

angle is outside the range of the training data used for training the NFME system in Section 2. The parameter, reference block size 11 × 11, and search region 31 × 31 were used to examine the performance of various methods described in Section 3.2. The results are shown in Fig. 13 and Table 4. The same conclusion was still reached: the NFME system provides the most accurate estimation when the image is rotated with a large angle. The average time took by each motion estimation algorithm for computing the motion vectors of an image is listed in Table 5, where the programs were run on a PC486-66. The table shows that the proposed

(18)

Fig. 12. Tests of fast search algorithms. (a) 3-step ow map. (b) 3-step scatter map. (c) Modi ed 3-step ow map. (d) Modi ed 3-step scatter map. (e) 3-step ow map (rotation test). (f) Modi ed 3-step ow map (rotation test).

NFME system takes the same time as FME and BF methods, longer time than HS and NR methods, but it reaches the highest accuracy.

4. Applications

4.1. Moving image compression

There are many methods used to compress data for transmission or storage of images. Frame skipping is one of the simplest methods of data compression for interframe motion images. For simplicity, suppose only the alternate frames are skipped. With no knowledge of the motion trajectory of the pixels, a skipped frame is generally reproduced either by repeating the preceding frame or by interpolation between the preceding and

(19)

Fig. 13. Ane motion with large rotation angle. (a) Desired ow map. (b) HS ow map. (c) FME ow map. (d) NR ow map. (e) BF ow map. (f) NFME ow map.

the following frames. Both of these methods harm the quality of motion reproduction, however. The former results in jerkiness in the reproduction of the motion while the latter blurs the moving areas.

Let U2k be the (2k)th frame where frames 2; 4; : : : ; 2k; : : : , have been skipped. Then U2k, the reproduced

value of U2k, is obtained (without motion compensation) as follows:

Frame repetition: U 2k(m; n) = U2k−1(m; n): (41) Frame interpolation: U 2k(m; n) =12{U2k−1(m; n) + U2k+1(m; n)}: (42)

The disadvantages of frame repetition and interpolation can be overcome by predicting or interpolating the pixels of the skipped frame along its motion trajectory. Hence, with motion compensation, frame repetition

(20)

Table 4 Table 5

The performance of various estimation methods on the ane Average time took by each motion estimation algorithm for motion with large rotation angle computing the motion vectors of an image

Translation (− 3; 2) Rotation 12

err1 err2 Âerr RMSE

HS 2.0931 2.0834 38.616 3.1053 FME 2.0217 1.7948 33.401 3.2401 NR 1.9745 3.1130 49.448 2.6358 BF 0.7207 0.6481 18.618 0.6922 NFME 0.6606 0.6179 15.798 0.6231 HS NR FME BF NFME Time (s) 15 25 75 75 75

and frame interpolation equations are replaced by U

2k(m; n) = U2k−1(m + q; n + l) (43)

and U

2k(m; n) =12{U2k−1(m + q; n + l) + U2k+1(m + q0; n + l0)}; (44)

respectively, where (q; l) and (q0; l0) are the motion vector of U

2k relative to the preceding and the following

frames, respectively. It is noted that q; l; q0; l0 in Eqs. (43) and (44) are all integer numbers in the image coordinates. How does one tackle such a problem if q; l; q0; l0 are not integer numbers as those produced by the proposed NFME method?

Coding and decoding systems (codecs) that use motion-compensation with fractional-pel accuracy have been reported in [7,12–15,20,24]. Typically, fractional-pel accuracy is achieved by simple bilinear interpolation which produces a spatially blurred prediction signal. Improvement gained in this way is referred to as the “ ltering e ect”. Sinc-interpolation, bilinear interpolation, and Wiener ltering were compared at integer-pel, 1=2-pel, 1=4-pel, and 1=8-pel accuracies. Remarkably, for the neural fuzzy motion estimator (NFME), the motion vector’s accuracy was at in nitesimal-subpixel. Hence, we propose a compensation method called the quarter compensation algorithm (QCA), an in nitesimal-subpixel compensation method, to compensate for inter-frame of images according to NFME’s motion estimation.

Suppose the origin point of an image is in the left-top corner and the positive directions are rightward and downward. Assume the motion vector v = (v-row; v-col) in some location (m; n) has been evaluated, and both v-row and v-col are oating numbers. Without loss of generality, it is assumed that both v-row and v-col are positive; i.e., the point in location (m; n) moves toward the lower-right corner. One distinguishes the integer part and decimal part of the motion vector as follows:

v-row = q + d-row; (45)

v-col = l + d-col; (46)

where (q; l) and (d-row; d-col) are the integer and decimal part of (v-row; v-col), respectively. The square which was in location (m; n) originally is now situated on the four locations (m + q; n + l), (m + q + 1; n + l), (m + q; n + l + 1), and (m + q + 1; n + l + 1). A moved square is thus divided into four portions, with the area of each portion decided by the decimal part, (drow; dcol), of the motion vector v. The area of each portion

is viewed as a weighting factor in deciding new gray values. In other words, the gray value of the pixel, which was originally in location (m; n), is distributed into the new gray values of the four locations (pixels) according to these weights.

(21)

Hence, for any location (x1; x2) in the compensated image frame, its gray value U2k(x1; x2) is decided by

the following equation: U 2k(x1; x2) = P (m; n) ∈ DPw(m; n) → (x1; x2)· U2k−1(m; n) (m; n) ∈ Dw(m; n) → (x1; x2) ; (47)

where (x1; x2) is the image coordinate, D= {(m; n) | one of the four portions of the moved points U4 2k−1(m; n)

is located in the position (x1; x2)}, w(m; n) → (x1; x2) is the area (weighting factor) of the portion of U2k−1(m; n)

that falls in the location (x1; x2),PDw(m; n) → (x1; x2) is the normalization factor.

For any location (x1; x2) in the compensated image frame, its gray value can be recovered by Eq. (47).

The numerator denotes the sum of product of the weighting factors and the gray values of the moved points which fall in the location (x1; x2), and the denominator is the sum of all weighting factors, which is used as

the normalization factor in the recovering process.

Experiment. Two real-image frames containing moving objects were tested using the quarter compensation algorithm (QCA) for interframe motion compensation. The result is compared with that of the integer com-pensation method described by Eq. (43), and the performance is measured by normalized mean square error (NMSE): NMSE =N 1 1× N2 X n1 X n2 [U2k(n1; n2) − U2k(n1; n2)]2; (48)

where U2k(n1; n2) is the original skipped image, U2k(n1; n2) the compensated image from U2k−1(n1; n2), and

N1× N2 the dimension of image frame.

The results are shown in Fig. 14. By integrating NFME with the QCA to compensate for the interframe of image, better results than using the integer-compensated method can be found as observed from Fig. 14. Without motion compensation (i.e., the interframe is skipped), the error (NMSE) is up to 22.514. This error is reduced to 10.658 using the integer compensation method, to 7.342 using the bilinear interpolation scheme, and to 2.764 using the proposed QCA.

4.2. Multiple moving object extraction

Letting the computer nd a speci c item among many uncertain moving objects is a dicult task. Human beings can easily distinguish di erent moving objects and grab them even when the light is dim or the object’s outline is vague. However, a computer must have more useful information, such as motion vector, to catch the moving objects and discard the redundancies in an image frame. It is possible that there is more than one moving object in an image frame, but only one is sought. How to extract it then becomes an important task in order to interpret or recognize the object. Obviously, the reasonable motion information can help to extract a moving object from image sequences. More accurate motion vectors will provide more useful information about the moving objects. Hence, the proposed NFME can be used to compute motion vectors and extract moving object because the NFME can provide accurate dynamic information. An experiment on multiple moving object extraction in this application was conducted.

Two image sequences with two moving objects contained in them are shown in Fig. 15. The motion estimation computed by the NFME system is shown in Fig. 16, where one can see clearly the two moving objects. As observed from this diagram, two di erent regions corresponding to the two di erent moving objects can be distinguished and extracted based on the combined information of moving angles and velocities transformed from the estimated motion vectors. Hence, a moving object toward the North and with a velocity of about 2 pixels=frame was extracted as shown in Fig. 15(d).

(22)

Fig. 14. Motion estimation and compensation. (a) The previous image. (b) The present image. (c) The compensated image using the integer compensation method. (d) The compensated image using the QCA.

Fig. 15. The images for multiple moving objects extraction. (a) The previous image. (b) The present image. (c) The moving objects. (d) The extracted moving object.

5. Conclusions

The neural fuzzy motion estimator presented in this paper has been shown to provide accurate motion vector estimates for uniform and ane modeled object’s motion. Based on block matching, each subblock in the search region is assigned a similarity membership contributing di erent degree to the estimated motion vector in the neural fuzzy motion estimator. This system is more reliable and robust in motion estimation than other methods such as Horn and Schunck’s optical ow, fuzzy logic motion estimator (FME), NR, fast block-matching, etc. Motion estimation and compensation are an integral part for video compression since it enables the removal of naturally existing temporal redundancies. In this paper a neural motion compensation

(23)

Fig. 16. The ow map of multiple moving objects.

method, the QCA, is proposed. The QCA can perform in nitesimal-subpixel interframe motion compensation. This algorithm provides better results than the conventional methods based on integer motion vectors for the interframe motion compensation. The proposed neural fuzzy motion estimator can be applied in various dynamic image-related applications where motion information is concerned. The proposed system is espe-cially e ective for extracting multiple moving objects. In this application, useful motion-related features were extracted from the estimated motion vectors. This is an important technique that can be processed at the earlier stages of vision, prior to higher level tasks such as segmentation, recognition, or interpretation. Motion information has become a basic knowledge for image understanding, image compression, vision-based control, etc. The accurate motion eld estimated by the proposed system can provide useful motion information for making better decisions.

References

[1] P.K. Allen, A. Timcenko, B. Yoshimi, P. Michelman, Automated tracking and grasping of a moving object with a robotic hand-eye system, IEEE Trans. Robotics Automat. 9 (1993) 152–165.

[2] R. Baker, Waveform based very low rate video coding, Keynote address presented at the Internat. Note Workshop Very Low Bit Rate Video Compression, Urbana, IL, 1993.

[3] J.L. Barron, D.J. Fleet, S.S. Beauchemn, Systems and experiment performance of optical ow techniques, Internat. J. Comput. Vision 12 (1) (1994) 43–77.

[4] G.C. Buttazzo, B. Allotta, P. Fanizza, Mousebuster: a robot for real-time catching, IEEE Control Systems 14 (1) (1994) 49–56. [5] K.W. Chun, J.B. Ra, An improved block matching algorithm based on successive re nement of motion vector candidates, Signal

Processing: Image Com. 2 (1994) 115–122.

[6] J.H. Duncan, T.C. Chou, Temporal edges: the detection of motion and the computation of optical ow, Proc. 2nd Internat. Conf. Comput. Vis., Tampa, 1988, pp. 374–382.

[7] S. Ericsson, Fixed and adaptive predictors for hybrid predictive=transform coding, IEEE Trans. Commun. COM - 33 (1985) 1291– 1302.

(24)

[8] C. Fennema, W. Thompson, Velocity determination in scenes containing several moving objects, Comput. Graph. Image Process. 9 (1979) 301–315.

[9] D.J. Fleet, Measurement of Image Velocity, Kluwer Academic Publishers, Norwell, MA, 1992.

[10] H. Gharavi, M. Mills, Blockmatching motion estimation algorithm – new results, IEEE Trans. Circuits Systems 37 (1990) 649–651. [11] B. Girod, D.J. Le Gall et al., Guest editorial: introduction to the special issue on image sequence compression, IEEE Trans. Image

Process. 3 (1994) 465–468.

[12] B. Girod, F. Joubert, Motion-compensating prediction with fractional pel accuracy for 64 kbits=s coding of moving videl, Proc. Internat. Workshop 64kbit=s Coding Moving Video, Hannover, Germany, 1988.

[13] B. Girod, The de ciency of motion-compensating prediction for hybrid coding of video sequences, IEEE J. Select. Areas Commun. SAC-5 (1987) 1140–1154.

[14] B. Girod, Motion-compensating prediction with fractional-pel accuracy, IEEE Trans. Commun. 41 (1993) 604–612.

[15] M. Glige, A high quality videophone coder using hierachical motion estimation and structure coding of the prediction error, Proc. SPIE Conf. Visual Commun. Image Proc. ’88 1001, Cambridge, MA, SPIE, 1988, pp. 864–874.

[16] B.K.P. Horn, B.G. Schunck, Determining optical ow, Arti cial Intell. 17 (1981) 185–204.

[17] J. Hwang, Y. Ooi, S. Ozawa, An adaptive sensing system with tracking and zooming a moving object, IEICE Trans. Inform. Systems E76-D (1993) 926–934.

[18] J.R. Jain, A.K. Jain, Displacement measurement and its applications in interfame coding, IEEE Trans. Commun. COM-29 (1981) 1799–1808.

[19] H.M. Jong, L.G. Chen, T.D. Chiueh, Parallel architectures for 3-step hierachical search blockmatching algorithm, IEEE Trans. Circuits Systems Video Technology 4 (1994) 407–416.

[20] C.D. Kuglin, D.C. Hines, The phase correlation image alignment method, Proc. IEEE 1975 Internat. Conf. Cybernet. Soc. (1975) 163–165.

[21] C.T. Lin, C.S. Gorge Lee, Neural-network-based fuzzy logic control and decision system, IEEE Trans. Comput. 40 (1991) 1320–1366. [22] H. Li, A. Lundmark, R. Forchheimer, Image sequence coding at very low bitrates: a review, IEEE Trans. Image Process. 3 (5)

(1994) 589–609.

[23] J.I. Lipp, Frame-to-frame image motion estimation with a fuzzy logic system, Proc. Circ. and Syst. Conf., 1992, pp. 987–990. [24] F. May, Model based movement compensation and interpolation for ISDN videotelephony, IEEE Int. Symp. Circuits Syst. (ISCAS

88) Espoo, Finland, 1988.

[25] A. Murat Tekalp, Digital Video Processing, Prentice-Hall, Englewood Cli s, NJ, 1995.

[26] H.H. Nagel, Displacement vectors derived from second-order intensity variations in image sequences, Comp. Graph Image Process. 21 (1983) 85–117.

[27] H.H. Nagel, On the estimation of optical ow: relations between di erent approaches and some new results, Arti cial Intell. 33 (1987) 299–324.

[28] A.N. Netravali, J.D. Robbins, Motion compensated television: Part I, Bell System Tech. J. 58 (1979) 631–670. [29] R. Plompen, Motion video coding for visual telephony, PTT Research Neher Laboratories, 1989.

[30] O. Tretiak, L. Pastor, Velocity estimation from image sequences with second-order di erential operators, Proc. 7th Internat. Conf. Patt. Recog., Montreal, 1984, pp. 20–22.

數據

Fig. 1. System architecture of the proposed neural fuzzy motion estimator (NFME).
Fig. 3. Block-matching algorithm.
Fig. 4. The output membership functions.
Fig. 5. Three step fast block search algorithm.
+7

參考文獻

相關文件

More precisely, it is the problem of partitioning a positive integer m into n positive integers such that any of the numbers is less than the sum of the remaining n − 1

As with all poetry, is-poems are a little more complicated than it looks. You need to write down all your associations and ideas and then just select a few, adding the

(a) Giving your answers in standard form, estimate (i)the total mass of the population of Europe.. (ii)how many more people live in Asia than

ix If more than one computer room is opened, please add up the opening hours for each room per week. duties may include planning of IT infrastructure, procurement of

– The distribution tells us more about  the data,  including how confident the system has about its including how confident the system has about its  prediction. It can

– It is not hard to show that calculating Euler’s phi function a is “harder than” breaking the RSA. – Factorization is “harder than” calculating Euler’s phi function (see

structure for motion: automatic recovery of camera motion and scene structure from two or more images.. It is a self calibration technique and called automatic camera tracking

– One of the strengths of CKC Chinese Input System is that it caters for the input of phrases to increase input speed.. „ The system has predefined common Chinese phrases, such