An efficient parallel motion estimation algorithm for digital image processing

(1)

An Efficient Parallel Motion Estimation Algorithm for Digital Image Processing

Liang-Gee Chen, Wai-Ting Chen, Yen-Shen Jehng, and Chih-Ta Chuch

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10764, R.O.C.

This paper presents an efficient parallel block matching algorithm called the parallel hierarchical one-dimensional search for motion estimation. This algorithm is based on the assumptions made by the one-at-a-time search and 2-D logarithmic search. Instead of finding the two-dimensional motion vector directly, it finds two one-dimensional displacements in parallel on the two axes independently within the search area. The major feature of this algorithm lies in the fact that its search speed for the motion vector is faster than that of the other search algorithms on account of its simpler computations and parallelism.

1.Introduction

In digital image processing, e.g., video conferencing application, the significantly high correlation between consecutive frames can be exploited more efficiently by considering the displacements of moving objects in the coding process. Therefore, in any motion-compensated coding scheme, the performance of the real-time system heavily depends on the accuracy and speed of the motion estimation.

Though, the block matching algorithm(Bh4A) is much more realizable according to its computational simplification [ 2 ] . However, one difficulty with the BMA is the extensive computations required by the full search(FS). To alleviate the computational overhead of the FS, several techniques such as the three-step hierarchical search (3 SHS) (adopted by [ 5 ] ) , the one- at-a- time search(0TS) [ 11, the 2-D logarithmic search(L0GS) [2] and a modified version of it (MLBGS) [I], have been developed. Although the strategies of these algorithms could reduce the computational complexity, they also suffer from the cost of irregular control flow when hardware realization is taken into consideration.

The objective of this paper is to propose an efficient hardware-oriented BMA, which involves considerably simpler arithmetic and regular control flow. The search

(2)

procedure and properties of the proposed algorithm are described in Section 11. The simulation results from applying the proposed algorithm and other BMA's for four measurements on a video sequence are reported in Section In. Conclusions are presented in Section IV.

II.The Proposed Algorithm

To reduce the computational complexity, all the search algorithms, except the FS, assume that the distortion increases monotonically as the searched point moves away from the direction of minimum distortion [4]. Based on the assumptions made by the OTS [ l ] and the LOGS, the parallel hierarchical one-dimensional search(PH0DS) is presented to reduce the number of sequential steps and search points. Under such assumptions, we need not make too much effort to find out the exact two-dimensional displacement of the moving object. Instead of searching the two-dimensional motion vector directly, the PHODS locates two one-dimensional displacements in parallel on two axes( say x and y ) independently within the search area. Thus the motion vector (Ax,Ay) is obtained from the results of the x-axis search (Ax,O) and the y-axis search

(0,Ay) concurrently. The mean absolute error(MAE) is

chosen as the matching criterion [2],[3] for the PHODS and all the algorithms being compared in the simulation because of its simpler computational complexity [l].

The search procedure of the PHODS, as illustrated in Fig. 1, is described as follows:

initialization: S = ~ L ~ O ~ ~ P I ; /* step size S=4 when p=4-7 */ Ax=O; Ay=O;

while ( S is larger than zero )

/* Search loop, executed Llog2pJ+1 times */ IN PARALLEL

/* L. J is a lower integer truncation function */

x-mis : (Ax,O)<-- the location of

min( D(Ax-S,O), D(Ax,O), D(Ax+S,O) );

y - a i s : (O,Ay)<-- the location of min( D(O,Ay-S), D(O,Ay), D(Oby+S) ); s=s/2;

end while-loop;

the motion vector is (Ax,Ay);

Table 1 presents a comparison of the number of search points and sequential steps required by the algorithms mentioned above. For a real-time hardware realization, the number of the required sequential steps can be a more important feature than the number of search points, since some of these can be evaluated by parallel computations .

Furthermore, the PHODS possesses the following features: 1) regularity: static sequential steps easily bring into control and regular data flow when real-time hardware implementation is considered, 2) simplification:

the less required number of search points per sequential step can reduce the computational overhead, 3) parallelism:

the x- and y-axis searches could be parallel processed, a characteristic that can shorten the motion vector search time.

Since the computations of all the algorithms, except the FS, are not continuous processes, any step must complete its computation before the next step begins. From the hardware point of view, the inherent feature of the discontinuity of all the algorithms will delay the search time for the motion vector because of the hardware latency. Fortunately, owing to the parallelism of the PHODS, the problem could be solved easily by pipeline interleaving the x- and y-axis searches when hardware realization is taken into consideration.

1II.Simulation Results and Performance

Comparisons

(3)

To test the performance of the PHODS, a sequence( a speaker with slow movements ) containing 16 frames with a frame sampling rate at 12.5 Hz, with each frame possessive of 256*256 pels and 8-bit resolution for each pel, is used in the simulation. The simulation is performed on an IJ3M PC/AT personal computer. The comparisons are made using six search algorithms, namely i)FS ii)3SHS iii)LOGS iv)MLOGS v)OTS vi)PHODS, in terms of the following four measures: l)PSNR(peak-to-peak

SNR),

shown in Fig. 2. 2)entropy, shown in Fig.3. 3)the percentage of unpredictable pels(pe1s with absolute prediction errors larger than three, over a range of 255, are classified as unpredictable pels [2]), shown in Fig.4. 4)search time(CPU time calculated only), shown in Fig. 5.

IV. Conclusions

In this paper, an efficient parallel search algorithm is presented for motion estimation. It is computationally simpler and performs about the same as the other motion estimation techniques reported. Most of the hardware proposed recently is applicable to the FS only [6] [7],

since the search strategies require a control overhead for the non-regular dataflow and will delay the search times for the motion vectors caused by the hardware latency. Thanks to the parallelism of the PHODS, the problem brought about by the inherent feature of the discontinuity of the search strategies could be relieved. Furthermore, the regularity, simplification and parallelism of the PHODS strongly imply that it is quite suitable for and efficient in VLSI implementation when used as a low-bit rate video coder. At last, a systolic architecture based on the PHODS is currently under development for real-time motion estimation.

V

.References

1. R. Srinivasan and K. R. Rao, "Predictive Coding

Based on Efficient Motion Estimation," IEEE Trans. Commun., vol. COM-33, NO. 8, pp. 888-896, Aug.

1985.

2 . H . G. Musmann, P. Pirsch, and Hans-Joachim Grallert, "Advances in Picture Coding," Proc. IEEE, vol.

73, NO. 4, pp. 523-548, 1985.

3. H. Gharavi and Mike Mills, "Blockmatching Motion Estimation Algorithms-New Results," IEEE T r a n s

.

Circuits and Systems, vol. 37, No. 5, pp. 649-651, May

1990.

4. A. N. Netravali and B. G. Haskell, Digital Pictures Representation and Compression. New York: AT & T Bell Lab., 1988.

5. Ronald Plompen, Yoshinori Hatori, Wilfried Geuen,

Jacques Guichard, Mario Guglielmo, and Harald Brusewitz, "Motion Video Coding in CCITT SG XV-The Video Source Coding," (Globecom88), pp. 997-1004, 1988.

6. T. Komarek and P. Pirsch, "Array Architectures for Block Matching Algorithms," IEEE Trans. Circuits and Systems, vol. CAS-36, No. 10, pp. 1301-1308, Oct.

1989.

7. K. M. Yang, M. T. Sun, and L. Wu, "A Family of VLSI Designs for the Motion Compensation Block- Matching Algorithm," IEEE Trans. Circuits and Systems,

vol. CAS-36, NO. 10, pp. 1317-1325, Oct. 1989.

(4)

-1 0 7 -7

0

7

Fig. 1. The parallel hierarchical one-dimensional search procedure. The motion vector is (3,6) in this example. 30

-

F3

5

c cz a 20

-

II.3SHS - , 0 1 0 Frame number

Fig. 2. PSNR comparison on prediction errors for algorithms I-VI.

I) FS. II) 3SHS. 111) LOGS. IV) MLOGS. V) OTS. VI) PHODS.

examples in Fig. 1-5 worst case for p=7

h n h 3 1 25 3 1 25 6 LLl 4 - 3 ! I 0 1 0 Frame number Fig. 3. Entropy of prediction errors for algorithms I-VI. I) FS. XI) 3SHS. III) LOGS. IV) MLOGS. V) OTS. UI) PHODS.

V 1x1 I

'

I1 1 0 2 0 Frame number

Fig. 4. Percentages of unpredictable pels for algorithms I-VI. I) FS. U) 3SHS. EI) LOGS. IV) MLOGS. V) OTS. VI) PHODS.

1 0 Frame number

Fig 5 Search limes for algorithms I1 VI on IBM PCIAT

'The search time of the PHODS should be reduced by 1/2. when parallel processing is considered

II) 3SHS Ill) LOGS IV) MLOGS V) OTS VI) PHODS

I I

1

V.OTS

1

VLPHODS

a) Required number of sequential steps. b) Required number of search points.

Table.1. Comparisons of the number of sequential steps and search points for algorithms 11-VI.