New techniques on deformed image motion estimation and compensation

(1)

Correspondence

New Techniques on Deformed Image Motion Estimation and Compensation Chin-Teng Lin, Gin-Der Wu, and Shih-Chieh Hsiao

Abstract— In this paper, new techniques for deformed image mo-tion estimamo-tion and compensamo-tion using variable-size block-matching are proposed, which can be applied to an image sequence compression system or a moving object recognition system. The motion estimation and compensation techniques have been successfully applied in the area of image sequence coding. Many research papers on improving the per-formance of these techniques have been published; many directions are proposed, which can all lead to better performance than the conventional techniques. Among them, both generalized block-matching and variable-size block-matching are successfully applied in reducing the data rate of compensation error and motion information, respectively. These two algorithms have their merits, but suffer from their drawbacks. Moreover, reducing the data rate in compensation error is sometimes increasing the data rate in motion information, or vice versa. Based on these two algorithms, we propose and examine several algorithms which are effective in reducing the data rate. We then incorporate these algorithms into a system, in which they work together to overcome the disadvantages of individual and keep their merits at the same time. The proposed system can optimally balance the amount of data rate in two aspects (i.e., compensation error and motion information). Experimental results show that the proposed system outweighs the conventional techniques. Since we propose a recovery operation which tries to recover the incorrect motion vectors from the global motion, this proposed system can also be applied for the moving object recognition in image sequence.

Index Terms— Block-matching, data rate, genetic algorithm, image motion estimation, moving object recognition.

I. INTRODUCTION

For the purpose of reducing temporal redundancy in image se-quence, motion estimation and compensation techniques have been successfully applied [1]–[3]. Motion estimation and compensation techniques can be regarded as determining a coordinate transfor-mation that is applied to every pixel of the frame. The coefficients of transformation are chosen by minimizing the distortion measure. Assume theIt(x; y) and It01(x; y) represent the pixel values of two consecutive frames, respectively, andR is a reference block in I_t. We can express the block-matching motion estimation as finding the motion vector(d_x; d_y) which satisfies the following requirement:

min (x; y)2R

(It(x; y) 0 It01(x + dx; y + dy))2: (1) In block-matching motion estimation, the assumption that an image is composed of rigid objects in pure translational motion is employed. In reality, however, motion is a complex combination of translation and rotation. In order to cope with rotation as well as other nonlinear deformations, a general approach to block-matching motion estimation was proposed in [4]. A geometric transformation Manuscript received June 19, 1998; revised August 22, 1999. This work was supported by the National Science Council, R.O.C, under Grant NSC 88-2622-E-009-006. This paper was recommended by Associate Editor M. S. Obaidat.

The authors are with the Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu 300, Taiwan, R.O.C.

Publisher Item Identifier S 1083-4419(99)09307-3.

instead of pure translation is employed and (1) is generalized to min

(x; y)2R

(It(x; y) 0 It01(f(x; y); g(x; y)))2 (2) wheref(x; y) and g(x; y) are the functions of transformation applied to coordinate values ofx and y, respectively. Geometric transforma-tions include: affine, perspective, and polynomial transformation. In all these transformations, translation is a special case.

Variable-size block-matching motion compensation is proposed in [5]–[7] to tackle the problem of nonuniform motion in search blocks of block-matching algorithm. In the algorithm, an image is adaptively divided into blocks of variable size to meet the assumption of uniform motion for all blocks. The algorithm is a split or merge segmentation scheme based on regular decomposition of an image into blocks of varying sizes, each of which has more or less uniform motion parameters. In comparison with traditional fixed size block-matching schemes, better results are obtained and fewer overhead bits are required.

The generalized block-matching algorithms can estimate the mo-tion informamo-tion more correctly than the convenmo-tional block-matching methods; thus, reduce the data rate in compensation error. How-ever, more bits are required to represent the motion information, which means data rate in motion information is increased. Similarly, variable-size block-matching techniques effectively reduce the data rate in one aspect (motion information or compensation error), but increase the data rate in the other.

In this paper, we propose a system which can effectively reduce total data rate consumed in off-line image sequence compression. The proposed system incorporates merits of generalized block-matching and of variable-size block-matching, and eliminates drawbacks of both techniques. The system makes every effort to reduce and balance the amount of data rate in both aspects optimally, and thus reduces the total data rate. Since we propose a recovery operation which tries to recover the incorrect motion vectors from the global motion, this proposed system can also be applied for the moving object recognition in image sequence. The simplified architecture of the proposed system is shown in Fig. 1. The functions of the proposed system are described in brief below. In the first layer, which is shown in the bottom of Fig. 1, the predicted frame is partitioned into blocks of smallest size. In the next layer, blocks are classified into several clusters of variable size. Consequently, pixels of the same cluster are transformed to a deformed region by the output function of that cluster,fij. Eventually, a region in the reference frame is thus obtained that is the best fit of block in the predicted frame.

This paper is organized as follows. In Section II, learning of output function is introduced. In Section III, partitioning of the input space is discussed. The proposed system is then described in Section IV. Simulations are conducted in Sections II-E and IV-B, which show that the performance of the proposed system is superior to that of the conventional block-matching motion compensation.

II. LEARNING OFOUTPUTFUNCTION

In this section, we are engaged in the learning of output function. First, we formulate the mathematical basis on determining the coef-ficients of a transformation. Consequently, we propose two learning algorithms, recursive least-square (RLS) and genetic algorithm (GA) 1083–4419/99$10.00  1999 IEEE

(2)

Fig. 1. Architecture of the proposed system.

learning algorithms. Finally, we conclude the section with simulations on the proposed learning algorithms.

A. Determination of Transformation Coefficients

In this section, we introduce how to solve for coefficients of a transformation, which is the basis of the proposed learning algorithms in the following sections.

An affine transformation that has six degrees of freedom, relating directly to coefficientsa11; a21; a31; a12; a22; a32, is of the form

x = a11u + a21v + a31

y = a12u + a22v + a32: (3) These six coefficients may be determined by specifying the coordinate correspondence of three noncollinear points in both images. Let (u_k; v_k) and (x_k; y_k) fork = 0; 1; 2 be three pairs of points in the reference and predicted frames, respectively. Affine mapping can be determined by solving the system of six linear equations as follows:

u0 v0 1 0 0 0 u1 v1 1 0 0 0 u2 v2 1 0 0 0 0 0 0 u0 v0 1 0 0 0 u1 v1 1 0 0 0 u2 v2 1 a11 a21 a31 a12 a22 a32 = x0 x1 x2 y0 y1 y2 : (4)

Let the system of equations given above be denoted as

WA = X (5)

solving for the coefficients in terms of the known (uk; vk) and (xk; yk) pairs, we have

A = W01X: (6)

The constraint onW to consist of noncollinear points serves to ensure thatW is nonsingular, i.e., det (W) 6= 0. Consequently, the inverse W01 _{is guaranteed to exist.}

x = a11u + a21v + a310 a13ux 0 a23vx;

y = a12u + a22v + a320 a13uy 0 a23vy: (8) Applying (8) to the four pairs of correspondence points yields the following 82 8 system of equations:

u0 v0 1 0 0 0 0u0x0 0v0x0 u1 v1 1 0 0 0 0u1x1 0v1x1 u2 v2 1 0 0 0 0u2x2 0v2x2 u3 v3 1 0 0 0 0u3x3 0v3x3 0 0 0 u0 v0 1 0u0y0 0v0y0 0 0 0 u1 v1 1 0u1y1 0v1y1 0 0 0 u2 v2 1 0u2y2 0v2y2 0 0 0 u3 v3 1 0u3y3 0v3y3 a11 a21 a31 a12 a22 a32 a13 a23 = x0 x1 x2 x3 y0 y1 y2 y3 : (9) We can thus determine coefficients by solving the linear system. This yields a solution to the general quadrilateral-to-quadrilateral problem. In addition to affine and perspective transformation used in [4], we introduce polynomial transformation in this paper. The polynomial transformation is originally intended to account for sensor-related spa-tial distortions and external image distortions. The form of bivariate polynomial transformation is x = N i=0 N0i j=0 aijuivj y = N i=0 N0i j=0 bijuivj: (10)

Note that a first-degree (N = 1) bivariate polynomial defines the mapping function that is exactly given by an affine transformation.

In polynomial transformation, the number of coefficients is related to the degree of polynomial. A polynomial transformation of degree N has 2K coefficients, K for aij andK for bij, where

K = N i=0 N0i j=0 1 = (N + 1)(N + 2)₂ : (11) If we provideK pairs of correspondence points, we can solve the following system of equations and determine the coefficients of a polynomial of degree N: 1 u0 v0 u0v0 u02 v02 1 1 1 uN0 v0N 1 u1 v1 u1v1 u12 v12 1 1 1 uN1 v1N 1 u2 v2 u2v2 u22 v22 1 1 1 uN2 v2N : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 uK vK uKvK uK2 vK2 1 1 1 uNK vKN

(3)

1 a00 a10 a01 a11 a20 a02 .. . aN0 a0N = x0 x1 x2 .. . .. . xK (12) and 1 u0 v0 u0v0 u02 v20 1 1 1 uN0 vN0 1 u1 v1 u1v1 u12 v21 1 1 1 uN1 vN1 1 u2 v2 u2v2 u22 v22 1 1 1 uN2 vN2 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 uK vK uKvK uK2 v2K 1 1 1 uNK vNK 1 b00 b10 b01 b11 b20 b02 .. . bN0 b0N = y0 y1 y2 .. . .. . yK : (13)

B. Supervised Learning of Output Function

In the previous section, concept of geometric transformations and methods to solve for their coefficients are introduced. Solving a system of equations, we are required to provide sufficient number of noncollinear point pairs in both frames. In an image sequence, however, it is not easy to estimate the coordinate of the corresponding point in another frame, when a particular point in the current frame is given. In other words, it is almost impossible to provide accurate point pairs in both frames. Exhaustive search algorithm used in [4] can find the coefficients, but it takes a very long computation time even the search region is small, which is not practical.

In this section, we propose a recursive least-square (RLS) learning algorithm to find the coefficients of geometric transformation. Let the ith row vector of matrix W defined in (5) be wi and the ith element ofX defined in (5) be xi. ThenA can be calculated using the following recursive least-square formula:

ai+1= ai+ Si+1wi+1T (xi+10 wi+1ai) (14) Si+1= Si0 Siw

T

i+1wi+1Si

1 + wi+1SiwTi+1; i = 0; 1; 1 1 1 ; p 0 1 (15)

A = ap (16)

with initial conditions of

a0= 0 and S0= I (17)

where is a positive big number and I is the identity matrix of dimensionK 2 K, and K is the number of columns in W.

Since the derivative of (7) shows that RLS method has poor convergence accuracy in nonlinear transformation, we cannot solve the system in (9) using the RLS method directly. In this paper, an alternative approach is proposed as follows according to the above observation. First, the correspondence pair of points are shifted to zero-mean by the same amount of displacement (sx; sy). Second, those shifted data are arranged as row vectors. Consequently, the row vectors are applied to the RLS method sequentially, and learning result, A0, is obtained. Finally, the exact solution, A, can be recovered fromA0 using the following equation:

a11 a21 a31 a12 a22 a32 a13 a23 = (a0 11+ a013sx)=div (a0 21+ a023sx)=div (a0 310 a011sx0 a021sy+ sx1 div)=div (a0 12+ a013sy)=div (a0 22+ a023sy)=div (a0 320 a012sx0 a022sy+ sy1 div)=div a0 13=div a0 23=div (18)

where div= 10a0₁₃0a0₂₃, andaijanda0_ijare elements ofA and A0, respectively. In simulations, coefficients of perspective transformation are determined correctly with the proposed approach.

In the RLS learning of polynomial transformation, the trans-formation function in (10) is also nonlinear. Similarly, we can shift the correspondence pair of points to zero-mean by amount of displacement(sx; sy), apply RLS to obtain a0ijandb0ij, and recover coefficients,aij andbij, of polynomial transformation froma0_ijand b0

ij using (19) and (20), shown at the bottom of the page. The supervised learning algorithm is summarized as follows.

Step 1: SelectK points, randomly or not. Note that the number of noncollinear points in the selected points must be equal to or greater than that of points required to properly determine the selected transformation.

Step 2: Apply full-search block-matching algorithm to the se-lected points to obtain their motion vectors.

Step 3: Coordinate of each point is used as input, and the sum of each point’s coordinate and its associated motion vector serves as desired output. They form a pair of training data.

Step 4: Apply training data sequentially to the RLS algorithm to learn the coefficients of transformation.

ad d = id jd i+jN a0 ij _di x j dy (0sx) i0d _(0s y)j0d ifdx; dy6= 0 i0 j0 i+jN a0 ij(0sx)i(0sy)j+ sx ifdx= dy= 0 (19) and bd d = id jd i+jN b0 ij _di x j dy (0sx) i0d _(0s y)j0d ifdx; dy6= 0 i0 j0 i+jN b0 ij(0sx)i(0sy)j+ sy ifdx= dy= 0 (20)

(4)

Fig. 2. Coding of the string in the binary-coding GA.

C. Genetic Algorithm Learning of Output Function

Genetic algorithm (GA) is a general purpose optimization algo-rithm with a probabilistic component that provide a means to search poorly understood, irregular spaces. Holland [8] developed the GA to simulate some of the processes observed in natural evolution. In this section, we propose two GA learning schemes of output functions.

1) Binary Coding Genetic Algorithm: In Section II-B, it is pointed out that if we apply a number of point pairs (three in affine transformation, four in perspective transformation, andK of (11) in polynomial transformation), the coefficients can be determined by solving a system of equations [(4), (9), (12), and (13) for the three transformations, respectively]. In other words, a transformation is determined by a group of point pairs (u_k; v_k) and (x_k; y_k).

Applying binary-coding GA to learn the transformation coef-ficients, we need to generate the strings by the coordinates of points, which is described as follows. We fix points (xk; yk) in the compensated frame, and vary points (u_k; v_k) in the reference frame. The variable points (uk; vk) are coded as strings in GA. A sufficient number of (uk; vk) and (xk; yk) can represent a transformation; hence, a transformation is thus coded. For example, in the perspective transformation case, four pairs of points are required to solve for coefficients. Therefore, four fixed points in the compensated frame, e.g., (0; 0), (8; 0), (8; 8), and (0; 8), are selected beforehand, and four correspondence points, (u0; v0), (u1; v1), (u2; v2), and (u3; v3), are coded as a string of eight bytes. To decode the string, a system of equations is constructed by the four pairs of points and the coefficients are obtained by solving the system of equa-tions. The obtained coefficients (i.e., the obtained transformations, f and g) are evaluated by applying them to (2). The smaller the value of (2) is, the better the obtained coefficients (transformations) are. The coding of the string in binary-coding GA is illustrated in Fig. 2.

The following steps are employed to generate and handle a set of strings (i.e., a population) in the GA learning [9].

Step 1: Initialization: We first generate an initial population

containingNpop strings, whereNpop is the number of strings in each population.

Step 2: Fitness Function: In this step, each string is decoded by

an evaluator into an objective function value called fit-ness,F IT . A fitness value is assigned to each individual in the population. According to (2), the fitness function

is defined by F IT = (x; y)2R jIt(x; y) 0 ~It(x; y)j; = (x; y)2R jIt(x; y) 0 It01(f(x; y); g(x; y))j (21)

wheref and g are the functions of transformation which are determined by the coded string. The purpose of the GA learning in our case is to search for the transforma-tion which minimizes the above fitness functransforma-tion.

Step 3: Reproduction: Reproduction is a process in which

indi-vidual strings are copied according to their fitness values, i.e., based on the principle of survival of the fittest.

Step 4: Crossover: We use the N-point crossover operator, in

which a string isN bytes in length. At first, two strings from the reproduced population are mated at random, and N crossover points (N bits position, where one crossover point for each byte) are selected according to the type of transformation we are learning. Then the strings are crossed and separated at these points to produce two new strings.

Step 5: Mutation: Mutation is the random alteration of bits in the

string which assists in keeping diversity in the popula-tion. As a source of new bits, mutation is introduced and is applied with a probabilitypm. In addition, a variation operation in GA is adopted, which enforces preserving the best string.

Step 6: If the stopping condition is not met (e.g., the error is above a predefined level), return to Step 2. Otherwise, the GA is terminated.

Our experiments show that the learning curve of the binary-coding GA on a block of uniform motion falls in the first several generations, and remains unchanged in the following generations. In our experience, the binary-coding GA typically converges in several generations except in blocks of nonuniform motion due to unmask of background or variation of illumination, which takes more generations to reach convergence.

2) Floating-Point Genetic Algorithm: In the previous section, we

code the coordinates of points (pixels) as a string, instead of coding coefficients themselves. Because of the restriction on binary coding, we can use only integer grid points. It is possible to extend the

(5)

(a)

(b)

Fig. 3. Illustration of crossover operation for floating-point strings: (a) before crossover and (b) after crossover.

(a)

(b)

Fig. 4. Illustration of mutation operation for a floating-point string: (a) before mutation and (b) after mutation.

point to sub-subpixel accuracy. However, strings will be very long if a infinitesimal-subpixel is desired. Moreover, the computation time of decoding a string into coefficients, which occurs in the stage of evaluating fitness value, occupies a significant portions of execution time in the GA learning.

In order to increase the accuracy of the estimated coefficients and lessen the computation load, floating-point GA [10] is adopted. In the floating-point GA, we can code the coefficients themselves as strings of real values. Thus, the computation load of decoding is eliminated. Initially, the GA randomly generates a population of floating-point strings. An interpreter takes this floating-point string and uses it to set the coefficients of transformation. In this way, according to a defined fitness function, a fitness value is assigned to each string in the population. GA can look for a better set of strings to form a new population as the next generation. The crossover operation for encoding is demonstrated in Fig. 3. The steps to generate and handle a set of strings in the floating-point GA learning is similar to those in the binary-coding GA except for the mutation process, in which a random number1(t; i) is added to the string. The range of 1(t; i) is variable according to thetth generation and the ith element in a string. Fig. 4 shows an example illustrating the mutation operation. Typically, the learning curve of the floating-point GA on a block of uniform motion falls quickly in the first several generations, and in the following generations the curve is slightly downward or stays on the same value.

D. Infinitesimal-Subpixel Compensation

In this section, we propose a quarter compensation algorithm (QCA). Note that the conventional motion compensation method operates only on the integer grids. We cannot apply the motion parameters estimated by generalized block-matching to the

conven-tional motion compensation technique directly, because the obtained coordinate value of a point is not restricted to an integer. QCA is thus proposed to tackle this problem.

Suppose that the origin point of an image is on the left-top corner, and the positive directions are rightward and downward; moreover, the motion vector d = (dx; dy) on the location (x; y) of the predicted frame is evaluated, in which bothd_x andd_y are floating values. Without loss of generality, it is supposed that bothdxanddy are positive, i.e., the point on the location(x; y) are compensated from a pixel of right-bottom direction in the reference frame. One distinguishes the integer part and decimal part of the motion vector as follows:

dx= dxi+ dxf

dy= dyi+ dyf (22)

where (dxi; dyi) and (dxf; dyf) are the integer and decimal parts of(d_x; d_y), respectively. The square, which is in location (x; y), is now reconstructed from the four locations,(x + dxi; y + dyi), (x + dxi+1; y+dyi), (x+dxi; y+dyi+1), and (x+dxi+1; y+dyi+1) in the reference frame. At these four locations, a reference square is thus divided into four parts, with the area of each part decided by the decimal part, (dxf; dyf), of the motion vector (dx; dy). The area of each part is viewed as a weighting factor in deciding the new gray value. More precisely, we assign the gray value of the pixel, which is in location (x; y), from the weighted gray values of the four locations (pixels).

Hence, for any location (x; y) in the compensated frame, with associated motion vector (dx; dy), the compensated gray value Ic(x; y) is decided by the following equation:

Ic(x; y) = (m; n)2D w(m; n)!(x ; y )1 Ir(m; n) (m; n)2D w(m; n)!(x ; y ) (23) where (x0_{; y}0₎_{= (x + d}1 x; y + dy):

source pixel of compensated point; D= f(m; n)j one of the four points1

that are nearest to the point(x0; y0)g; w(m; n)!(x ; y ): the area (weighting factor) of the block of(m; n) that falls on

on the unit-square of(x0; y0):

For any location(x; y) in the compensated image frame, its gray value can be obtained by (23). In (23), the numerator denotes the sum of product of the weighting factor and the gray value of points that belong toD, and the denominator is the sum of all weighting factors, which is used as the normalization factor in the reconstruction process.

E. Simulation Results

In this section, we compare the performance of a total of ten algo-rithms: conventional block-matching (BM), and affine, perspective, polynomial transformations with three different learning algorithms (RLS, binary-coding GA, floating-point GA), respectively. We syn-thesize nine test images of different transformations to evaluate the learning algorithms. Among the nine test images, image 1 is the resulted image of rotation; image 2 is that of scaling; images 3 and 4 are resulted images of shearing in different directions. These images mentioned above are affine-transformed images. Images 5

(6)

Fig. 5. Compensation error comparison between BM and RLS, GA, FGA affine transformations.

and 6 are perspective-transformed images. Images 7 and 8 are more complex images of affine and perspective transformations, respectively. Image 9 is created to test the subpixel prediction ability of learning algorithms. The quality evaluation between the original image and reconstructed one is performed under some objective quality criteria. These criteria include total absolute difference error, SNR, and data entropy, which are defined as follows.

• total absolute difference error

Error= jIt(x; y) 0 ~I(x + dx; y + dy)j; (24) • signal to noise ratio (SNR)

SNRdb= 10 log10 M i=1 N j=1 I2_{(i; j)} M i=1 N j=1 (I(i; j) 0 ~I(i; j))2 ; (25)

• estimated entropy of compensation error H = 0

255 i=0

pi log2pi (26)

where pi is the probability of pixel value i in the difference image.

Note thatI and ~I are the desired image frame and the reconstructed image frame, respectively. The absolute difference error reflects the ability of learning algorithms in motion estimation and compensation. SNR indicates the visual quality of the reconstructed images. The estimated entropy of compensation error is included to show the approximate data rate.

In Fig. 5, we compare the performance in terms of compensation error between BM and affine transformation of generalized block-matching with RLS, GA (binary-coding GA), FGA (floating-point GA) learning algorithms. The RLS predicts the frame more accurate than the BM except for two frames. In addition, the performances of both binary-coding and floating-point GA’s outweigh the RLS and BM, and the floating-point GA is superior to the binary-coding one. Fig. 6 shows the quality of the reconstructed images on the basis of signal-to-noise ratio. As expected, the performance of GA’s is superior to the others. The estimated entropy of compensation error is shown in Fig. 7. The GA’s outweigh the other algorithms again. Note that the performance of RLS in terms of SNR and entropy is poor when compared with BM, because the RLS is unable to predict the test images correctly and produces more frame difference errors on

Fig. 6. SNR comparison between BM and RLS, GA, FGA affine transfor-mations.

Fig. 7. Entropy comparison between BM and RLS, GA, FGA affine trans-formations.

Fig. 8. Compensation error comparison between BM and RLS, GA, FGA perspective transformations.

border blocks where motion vectors are prone to error. Compensation errors on border blocks greatly deteriorate the performance of RLS.

Fig. 8 shows the test result in terms of compensation error between BM and perspective transformation, instead of affine transformation, of generalized block-matching with RLS, GA, and FGA learning. The compensation errors of GA’s are again lower than those of the other two algorithms. However, compensation errors on the binary-coding GA and floating-point GA are approximate in this case. The results of comparisons on SNR and entropy are shown in Figs. 9 and 10, respectively.

(7)

Fig. 9. SNR comparison between BM and RLS, GA, FGA perspective transformations.

Fig. 10. Entropy comparison between BM and RLS, GA, FGA perspective transformations.

We also test the algorithms with polynomial transformation, and the comparison of total absolute difference error is shown in Fig. 11. The performance of BM is the worst when compared with the other three algorithms. The binary-coding GA performs better than the floating-point one, but the RLS algorithm sometimes results in the lowest error. The outcome indicates that the implementation of GA’s is not optimized for learning coefficients of polynomial transformation. The comparison of algorithms on SNR is shown in Fig. 12. The performance of GA’s is better than that of RLS and BM. Learning algorithms are also compared in terms of coding entropy. As depicted in Fig. 13, the BM needs more bits than others. Note that the performance of RLS is superior to that of BM in terms of compensation error and estimated entropy. However, the performance of RLS is similar to that of BM in terms of SNR, because the RLS fails to predict the test images correctly on the border blocks.

In summary, the performance of GA’s in terms of frame difference, SNR, and entropy is superior to the other algorithms. This is due to the global optimization capability of GA’s. With the crossover and mutation operations, the GA, unlike the block-matching and RLS algorithms, is able to get out of a local minimum trap and find better local minima or even the global minimum. However, it is observed that the standard GA learning hardly reaches the global minimum, which means the standard GA lacks the ability on fine local tuning. Three possible reasons account for this situation. First, the GA is trapped into local minima when almost all the individuals in a generation look alike. Second, assume the global minimum is md, and the current located minimum ismi. It is possible that the

Fig. 11. Compensation error comparison between BM and RLS, GA, FGA polynomial transformations.

Fig. 12. SNR comparison between BM and RLS, GA, FGA polynomial transformations.

Fig. 13. Entropy comparison between BM and RLS, GA, FGA polynomial transformations.

continuous evolution of GA’s cannot produce an individual (string) which is closer tomdthanmi. Although there may be a chance to reachmd, but it is not guaranteed to occur in a few generations. Third, in using floating-point GA, the global minimummdis harder to reach whenmiis closer tomddue to the possible big change of floating-point strings applied by GA operations. If we fix the change of the strings to a small scale, then the number of generations to reachmd will increase. The performance of standard GA can be improved with tailored mutation operation and fine local tuning. Tailored mutation

(8)

reduce the data rate of motion information effectively: • reduce the number of blocks used in compensation; • reduce the entropy of transformation coefficients.

Partitioning of input space must operate without increasing total frame difference error, or operate under the criteria which can optimally reduce the total data rate.

A. Partitioning of Input Space Using Parameter Similarity

As pointed out, an efficient way to reduce the data rate of motion information is using less blocks, thus less motion information, in motion compensation. Under the premise that the total frame error does not increase, we can achieve this goal by classifying those blocks which experience nearly the same transformation as a group. At first, we take the coefficients of transformation as elements of the pattern directly. In order to determine whether two transformations are similar or the same, we simply compare each corresponding coefficients of two transformations. If the difference of each corresponding coefficients is below a threshold valueTc, we can consider these two transformations as the same one. The algorithm of clustering is described in details as follows.

Step 1: Initialize the first clusterC₁= P₁, whereP₁is the first pattern. Let elements ofCiandPi becij andpij; j = 1; 2; 1 1 1 N, where N is the number of elements in a pattern.

Step 2: Set i = 2.

Step 3: Input pattern P_i.

Step 4: For each cluster, calculate distanceDikbetween pattern Pi and cluster Ck:

Dik= j

jpij0 ckjj:

Step 5: Determine thec for which Dic< Dikfor allk 6= c.

Step 6: IfD_icis less than a predefined constantT_c, then pattern Piis assigned to clusterc. Otherwise, a new cluster is created with patternPichosen as the center of the new cluster.

Step 7: If there are more patterns, seti = i + 1 and go back to Step 3.

The approach that uses coefficients of transformation as content of a pattern directly suffers from two problems. First, the threshold value Tcis difficult to determine. We need to setTcas large as possible in a reasonable range in order to make this method effective. However, the effect of the coefficients on the transformed point,(x; y), differs as the value of reference point, (u; v), changes. Second, in the perspective transformation, two transformations whose coefficients differ a lot may produce similar deformation, if their coefficients are proportional.

To tackle the problems mentioned above, in an alternative ap-proach, we do not use the coefficients of transformation as elements

to the prediction error and motion parameters, because they operate under entropy criteria. Two entropy criteria for split are stated as follows.

• If the total absolute difference error (or another error measure) of the motion compensated block is above a preset thresholdT , the block is split:

Errornosplit> T =) split: (27) • If the extra-cost to send additional motion parameters is worth the gain obtained on the frame difference side, then the block is split into four subblocks:

n 1 (HFDnosplit0 HFDsplit)

> 4 1 H~v split0 H~v nosplit=) split (28) where n is the number of pixels in the block, HFDsplit and HFDnosplit are their entropy (26) with or without split, re-spectively, andH_{~v split} andH_{~v nosplit}the entropy [(26)] of the motion vectors with or without split, respectively.

The merge operation operates in the sense of reducing total data rate. Unlike the combination process mentioned in the previous section, which keeps the compensated error unchanged or decrease in a small amount, the merge process that operates under the criteria 1 Errorcombine < T =) combine (29) and

n 1 (HFDcombine0 HFDnocombine) < 4 1 H~v nocombine0 H~v combine

=) combine (30)

does not have such restriction. In brief, if merge of blocks, which reduces the number of motion parameters by one-fourth in the quad-tree segmentation but increases error of compensation, can achieve a lower data rate, then the blocks are merged as one larger block.

The split operation is also proposed for the goal of reducing total data rate. However, it reversely splits a larger block to small ones and assigns each split blocks a set of motion parameters. The criteria mentioned in (27) and (28) can serve as criteria for split. The main purpose of split operation is to prevent premature combination. Because the estimated motion parameters may be incorrect, the combination of blocks with incorrect parameters corrupts the further learning of these blocks. The split operation is thus proposed to remedy the problem of combination with error motion information.

In comparison with combination methods proposed in the previous section, the merge and split operation perform more effectively in reduction of data rate, because the number of blocks is effectively reduced in the operations.

(9)

(a) (b) (c)

Fig. 14. A pair of consecutive images and its motion vectors: (a), (b) moving rectangle, and (c) the motion vectors of the moving rectangle in (a) and (b).

C. Quad-Tree Segmentation

In previous sections, methods of reducing the number of blocks partitioned in an image are introduced. In these methods, images are segmented adaptively into variable sizes, which introduces overhead of representing size and location of blocks in an image. When the partitions are fixed, there is no overhead. However, when the partition of an image is variable, the overhead occupies an amount of bit-rate. Therefore, the quad-tree segmentation is introduced to reduce the overhead.

Quad-trees which represent the segmentation are typically con-structed by top-down or bottom-up methods [11]. In the proposed system, a bottom-up construction is adopted. The procedure of merge in the proposed system is described in Sections III-A and III-B. The quad-tree structure can be represented by treecode [12]. The treecode is encoded by listing the nodes encountered by a depth-first traversal of the tree structure. If encountered node is not a leaf, then it is represented by “1”; otherwise it is represented by “0”. The approach of treecode requires exactly one bit of overhead per node, which is used to distinguish between leaf nodes and internal nodes. Therefore, a small overhead rate is achieved with the structure of quad-tree segmentation.

D. Improvement of Partitioning with Motion Information of Neighbor Blocks

Several methods of partitioning input space are discussed in the previous sections. They rely on the assumption that all motion information of blocks are correct, which is not always true in image sequence. In fact, the motion information which minimizes the compensated error is not necessarily the real motion of objects in an image sequence. We demonstrate the statement by the following example.

In Fig. 14(c), motion vectors of a “moving rectangle” are shown. The real motion vector of the solid block is (04; 07). Because the block-matching algorithm finds a best fit of a small reference subblock, in which several candidates are possible, the estimated motion vector can be used to perform compensation perfectly though they may not be the actual motion vector.

The estimated motion information which differs from the actual one may be recovered from that of neighbor blocks. Because a moving object in an image sequence is larger than the block size of a minimal block in many occasions, motion information of neighbor blocks are usually the same as, or approximate to, current blocks. The concept of global motion is discussed in many researches on motion estimation or related interests. In [13], a method which reconstructs the frame with the aid of neighbor motion vector is successfully applied to motion compensation.

We propose a recovery operation which tries to recover the incorrect motion vector from the global motion. In the operation,

Fig. 15. The proposed operation that tries to recover the incorrect motion information from the global motion.

the parameters of the current block and that of neighbor blocks are tested with motion compensation, and their compensation errors are compared. After comparison, the parameters of this block are then replaced by the ones with the lowest error. This operation is illustrated in Fig. 15. The upper-left block whose motion information isf0is tested with motion compensation, but the motion information is temporarily replaced with f1, f2, or f3, each for a time. After comparing the compensation error of each motion information, the one with the lowest error,fk, becomes the new motion information of the current block, wherek is 0, 1, 2, or 3.

IV. PROPOSEDDEFORMEDIMAGEMOTIONESTIMATIONSYSTEM We have discussed in the previous sections two aspects of reducing the amount of information required in a block-matching motion esti-mation and motion-compensation compression algorithm. In the past, incorporating techniques of two aspects simply means segmentation of images followed by matching or by generalized block-matching [14], which lacks the ability of determining as a whole the blocks in the same motion that is more complex than pure translation. In this paper, we propose a system in which those information reduction algorithms can cooperate well.

A. Description of the Proposed System

The architecture of the proposed system is already shown in Fig. 1. The block diagram of the proposed system in a motion estimation and compensation system is shown in Fig. 16. There are two main components in the proposed system. The component of output function and that of input partition are discussed in Sections II and III, respectively. Quad-tree segmentation is adopted in combining and splitting of blocks in the input space.

The operation procedure of the proposed system is described as follows.

Step 1: Input space (compensated image) is partitioned into blocks of minimum size.

Step 2: For each block, learn the associated output function (transformation).

Step 3: The output functions of blocks are replaced and tested with neighbor output function, as described in Section III-D.

(10)

Fig. 16. Block diagram of the proposed deformed image motion estimation system.

Fig. 17. Compensation error comparison among GA perspective, FGA affine, FGA perspective, and the proposed system.

Step 4: Classify transformation parameters according to the methods described in Section III-A. If combination is possible on the basis of quad-tree segmentation, then do the combination.

Step 5: Split blocks according to (27) and (28).

Step 6: For each block, relearn the associated output function. Output functions of neighbor blocks and current block are used as initial parameters (for RLS) or initial popu-lation (for GA) of learning process.

Step 7: Combine blocks according to (29) and (30).

Step 8: If the ending condition is not reached, go back to Step 3.

Step 9: For each block, verify the total data rate of the block-matching algorithm when it is applied to the block and that of generalized block-matching algorithm. If the data rate of block-matching is lower, then replace the motion information of the block with the motion vector estimated by the block-matching algorithm.

We explain each step of the operation procedure of the proposed system. In Step 2, GA is adopted to precisely estimate the motion in-formation. In order to prevent GA from being trapped in local minima, Step 3 is adopted to help GA escape from local minima with global motion; moreover, it can speed up the learning of GA in the future relearning. After learning of output function, Step 4 is introduced to reduce the data rate of motion information, including the data rate of representing output function and input partition. Step 5 prevents the premature combination of blocks, as described in Section

III-Fig. 18. SNR comparison among GA perspective, FGA affine, FGA per-spective, and the proposed system.

Fig. 19. Entropy comparison among GA perspective, FGA affine, FGA perspective, and the proposed system.

B. Since the input partition and corresponding output function have been modified, the learning of output function is processed again in Step 6 to further reduce the data rate of compensation error. At this point, we can apply Step 7 to combine blocks in the input space. Step 7 is executed after other operations in that it degrades the accuracy of motion information in exchange with less data rate. The ending condition in Step 8 may be that the smallest fitness value falls below a preset threshold, or that the number of generations exceeds

(11)

Fig. 20. Test images: three objects.

Fig. 21. Test images: the crane system.

Fig. 22. Reconstructed image and partition map of “three objects.”

a preset value. Finally, in Step 9 we further reduce the total data rate. Since the generalized block-matching trades more bits in motion information for less bits in compensation error, sometimes the bits saved in compensation error are less than that consumed in motion information. Thus Step 9 is proposed to tackle this problem.

In brief, the algorithms introduced in the previous two sections are effective in reducing data rate, but they have restriction or drawbacks. The proposed system integrates these algorithms and further improve the performance of each algorithm with the help of other algorithms. It is shown to be superior to the conventional techniques by simulations in the following section.

B. Simulation Results

Nine test images used in Section II-E are tested by the proposed system. In the test, the floating-point GA learning is adopted in the proposed system. Comparisons among the proposed system and GA perspective, floating-point GA affine, floating-point GA perspective learning algorithms are shown in Figs. 17–19. The proposed system is obviously superior to other algorithms.

We then test the proposed system with two pairs of consecutive images. The first pair of images is “three objects” in which three objects rotate in different directions and degrees as shown in Fig. 20. The second pair of image is “the crane system” shown in Fig. 21.

(12)

Fig. 23. Reconstructed image and partition map of “the crane system.”

TABLE I

COMPENSATEDRESULTS OF THE“THREE OBJECTS” IMAGE

TABLE II

COMPENSATED RESULTS OF THE“THECRANE SYSTEM” IMAGE

TABLE III

DATARATECOMPARISONBETWEEN THEPROPOSEDSYSTEM ANDCONVENTIONALBLOCK-MATCHINGTECHNIQUE ONTWOTESTIMAGESEQUENCES

The “three objects” images are artificial and the illumination of objects and background does not change. The other one is taken from real image sequences, and illumination of the contents in image varies. The reconstructed images and partition maps of these images by using the proposed system are shown in Figs. 22 and 23. The performance of the proposed system and BM and other affine transformation learning algorithms are compared, as given in Tables I and II. We can conclude that the proposed system is effective in reducing compensation error even the image sequences are not intensity-invariant. However, the proposed system needs more computation time than the others. If the system is applied to one minute of video image, it will take about 3 246 000 s by using

Pentium II 266 MHz (Frame size = 256 2 256, Frame rate = 10).

The comparison in terms of data rate required between the block-matching algorithm and the proposed system is summarized in Table III. The data rate of error image and that of motion information is effectively reduced in the proposed system. The overhead of image segmentation is small due to the quad-tree segmentation. Though the motion information of one block in the proposed system requires more bits than that of block-matching, the total data rate in motion information decreases because the number of blocks is effectively reduced. In “mobile calendar,” data rate of motion information is much higher than that of block-matching, because only a small

(13)

TABLE IV

SUMMARY OF THE FEATURES ANDCOMPARISON OF THEPROPOSEDALGORITHM

number of blocks is combined. However, the amount of data rate that decreases in error image exceeds that increases in motion information, which results in a reduced total data rate.

V. CONCLUSION

A system for deformed image motion estimation and compensation is proposed in this paper. The system is composed of several components and algorithms which try to reduce in different aspects the total data rate of image compression. Table IV summarizes the features of the proposed algorithms. The component concerned with learning of output function is on the purpose of reducing inter-frame redundancy. To learn the output function, we propose the RLS and GA learning of generalized block-matching. In supplement with generalized block-matching, we present a quarter compensation algorithm which can reconstruct the predicted frame according to the motion information of generalized block-matching. The component focused on partitioning of input space is on the purpose of reducing the data rate of motion information. Segmentation algorithms of variable-size block-matching are proposed to reduce the number of parameters. In addition, we introduce the concept of global motion which helps to find the motion of blocks and consequently improve the segmentation of image and reduce the data rate. The proposed system successfully integrates these two components into a whole system. We discuss the drawbacks of each algorithm, and explain how to eliminate the drawbacks with the help of other algorithms.

Simulation results show that the proposed system effectively reduces the data rate in image compression.

REFERENCES

[1] G. Tziritas and C. Labit, Motion Analysis for Image Sequence Coding. Amsterdam, The Netherlands: Elsevier, 1994.

[2] A. N. Netravali and J. D. Robbins, “Motion compensated television coding part I,” Bell Syst. Tech. J., vol. 58, no. 3, pp. 629–668, 1979. [3] A. N. Netravali and B. G. Haskell, Digital Pictures Representation and

Compression. New York: Plenum, 1991.

[4] V. Seferidis and M. Ghanbari, “General approach to block-matching motion estimation,” Opt. Eng., vol. 32, no. 7, pp. 1464–1474, 1993. [5] M. H. Chan, Y. B. Yu, and A. G. Constantinides, “Variable size block

matching motion compensation with applications to video coding,” Proc. Inst. Elect. Eng., vol. 137, no. 4, pp. 205–212, 1990.

[6] F. Dufaux and M. Kunt, “Multigrid block matching motion estimation with an adaptive local mesh refinement,” in Proc. SPIE Visual Commu-nication Image Processing’92, Boston, MA, Nov. 1992, vol. 1818, pp. 97–109.

[7] F. Dufaux and F. Moscheni, “Motion estimation techniques for digital TV: A review and a new contribution,” Proc. IEEE, vol. 83, pp. 858–875, June 1995.

[8] J. H. Holland, Adaptation in Natural and Artificial Systems. Cam-bridge, MA: MIT Press, 1992.

[9] D. E. Goldberg, Genetic Algorithm in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley, 1989.

[10] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs. New York: Springer-Verlag, 1994.

[11] T. Pavalidis, Algorithms for Graphics and Image Processing. Rockville, MD: Computer Science, 1982.

(14)

M. Sezan and R. Lagendijk, Eds. Norwell, MA: Kluwer, 1993, pp. 447–481.

[18] K. W. Chun and J. B. Ra, “An improved block matching algorithm based on successive refinement of motion vector candidates,” Signal Processing: Image Com., pp. 115–122, 1994.

[19] Y. H. Pao, Adaptive Pattern Recognition and Neural Networks. Read-ing, MA: Addison-Wesley, 1989.

[20] J. R. Jain and A. K. Jain, “Displacement measurement and its applica-tions in interframe coding,” IEEE Trans. Commun., vol. COM-29, pp. 1799–1808, Dec. 1981.

POPFNN-AARS(S): A Pseudo Outer-Product Based Fuzzy Neural Network

C. Quek and R. W. Zhou

Abstract— A novel fuzzy neural network, the Pseudo Outer-Product-based Fuzzy Neural Network using the singleton fuzzifier together with the Approximate Analogical Reasoning Schema, is proposed in this paper. This network shall henceforth be referred to as the singleton fuzzifier POPFNN-AARS. The singleton fuzzifier POPFNN-AARS employs the Approximate Analogical Reasoning Schema (AARS) [13] instead of the commonly used Truth Value Restriction (TVR) method [19]. This makes the structure and learning algorithms of the singleton fuzzifier POPFNN-AARS simpler and conceptually clearer than those of the POPFNN-TVR model [20]–[22]. Different Similarity Measures (SM) and Modification Functions (FM) [23] for AARS are investigated. The structures and learn-ing algorithms of the proposed slearn-ingleton fuzzifier POPFNN-AARS are presented. Several sets of real-life data are used to test the performance of the singleton fuzzifier POPFNN-AARS and their experimental results are presented for detailed discussion.

Index Terms— Approximate analogical reasoning schema (AARS), fuzzy rule identification, integrated fuzzy neural networks, modification functions, one-pass learning, pseudo outer-product learning, similarity measures, singleton fuzzifier POPFNN-AARS.

I. INTRODUCTION

Zhou and Quek [20], [21] proposed the structure and learning algorithms of the pseudo outer-product based fuzzy neural network using the truth value restriction method (POPFNN-TVR). This novel fuzzy neural network has successfully been applied in an automatic Manuscript received August 7, 1997; revised March 1, 1999. This paper was recommended by Associate Editor L. O. Hall.

The authors are with the Intelligent Systems Laboratory, Nanyang Techno-logical University, Singapore 639798.

Publisher Item Identifier S 1083-4419(99)08051-6.

of the TVR method. A brief description of the AARS fuzzy inference model and its similarity measures (SM’s) as well as modification functions (MF’s) is given in the following section. Section III briefly contrasts the TVR and AARS fuzzy inference models. The structures and learning algorithms of the proposed singleton fuzzifier POPFNN-AARS’s are presented in Sections IV and V, respectively. Section VI discusses the experimental results and analysis on the performance of the proposed fuzzy neural network. and Section VII concludes the work in this paper together with a brief description of the family of POPFNN architectures developed at the Intelligent Systems Laboratory.

II. APPROXIMATEANALOGICALREASONINGSCHEMA(AARS) The approximate analogical reasoning schema (AARS) was pro-posed by Turksen and Zhong [13] as an alternative to the commonly used compositional rule of inference (CRI) [18] and the truth value restriction (TVR) method [19]. It exhibits the advantages of fuzzy set theory and analogical reasoning in expert systems development.

Given an observed fact A0 and a simple fuzzy rule “if A then B;” the basic idea of AARS is to modify the consequence B of the fuzzy rule according to the closeness of the observed factA0 to the antecedentA: If they are close (similar) enough, in comparison to a threshold value, then the rule can be fired and the conclusion B0 can be deduced using some modification techniques. Formally, their ‘closeness’ is expressed as a similarity measure (SM) that is in turn obtained from a distance measure (DM) [23]. Once the similarity

measure SM(A; A0) between A and A0 exceeds the value of the threshold; the fuzzy rule is fired. A modification function (MF) is subsequently constructed and is used to modify the consequence B of the fuzzy rule to deduce a conclusionB0; instead of using Zadeh’s CRI. The whole fuzzy inference process of using AARS is shown in Fig. 1. Brief introductions on different SM’s and MF’s are presented in Sections II-A and II-B, respectively.

A. Similarity Measures

The notion of similarity plays a fundamental role in theories of knowledge and behavior. The theoretical analysis of similarity relations has been dominated by geometric models. These models represent objects as points in some coordinate spaces such that the observed dissimilarity among objects corresponds to the metric distance between the respective points. Many measures of similarity among fuzzy sets have been proposed in the literature, and some have been incorporated into linguistic approximation procedures. In Turksen and Zhong’s paper [13], similarity measure (SM) between fuzzy sets is defined as a measurement transformed from a distant measure (DM) by using SM = 1/(1 + DM). In 1987, Zwick et al.