ANALYSIS AND HARDWARE ARCHITECTURE FOR GLOBAL MOTION ESTIMATION
IN MPEG-4 ADVANCED SIMPLE PROFILE
Shao-Yi Chien, Ching-Yeh Chen, Wei-Min Chao,
Yu- Wen H u n g , and Liang-Gee Chen
DSP/IC
Design Lab
Graduate
Institute
of Electronics Engineering and Department
of Electrical Engineering
National Taiwan University
1, Sec.
4,
Roosevelt Rd., Taipei 106, Taiwan
{shoayi, cychen, hydra, yuwen, lgchen} @video.ee.ntu.edu.tw
ABSTRACT
Global motion estimation (GME) and compensation is one of the
key modules in MPEG-4 Advanced Simple Profile (ASP). How- ever, there are no hardware architectures for GME since exist- ing algorithms are not suitable for hardware implementation. In
this paper. GME in MPEG-4 ASP is analyzed, and a hardware- oriented GME algorithm is proposed according to the analysis re- sults, which is a combination of a feature points based algorithm and a novel iterative sprite point matching algorithm. The asso- ciated hardware architecture is also proposed. Simulation results show that the performance of the proposed algorithm is the same as
that of MPEG-4 Verification Model. The implementation results show the proposed hardware architecture can achieve the require- ments of MPEG-4 A S P 4 L 3 with 66K gates and 31Kb intemal memory at 100 MHz working frequency.
1. INTRODUCTION
Advanced Simple Profile (ASP)
is
definedin
Streaming Video Pro- file in Amendent4 of MPEG-4 [t]. Compared with MPEG-4 Sim- ple Profile (SP), ASP is defined for devices with high processing power, and more than 50% of bits can be saved[Z].
The most important compression tools contained in MPEG-4 ASP are B-frame, quater-pel motion estimation/compensation (QMFJQMC), and global motion estimation/compensation (GMFJGMC). The state- of-art ASP hardware encoders do not implement GME since the computation complexity is too high, and there are no algorithms suitable for hardware implementation.
Many global motion estimation algorithms have been proposed. They
can be
classifiedinto
threetypes: frame
matching, differ- ential technique, and feature points based algorithms 131. Frame matching algorithm matches the whole frame with the candidate motion parameters to find the global motion vector 141. The differ- ential method employs Taylor series to expand a criterion function into polynomial equations. such as frame difference of the current frame and the global motion compensated frame [5]. Both of these two kinds of algorithms have large computation complexity. The fcamre points based algorithms [6] first find the motion vectors of the feature points. The global motion vector can be then derived with regression. The computation complexity of this kind of algo- rithms is much smaller than those of the former two kinds, but the motion vectors are not as accurate as theirs.Figure I : In MPEG-4, GMC parameters are transmitted with the positions of the sprite points.
Instead of transmission the global motion parameters directly. in MPEG-4, the global motion information is transmitted with the trajectories of the reference points in order lo handle the quantiza- tion error during compression. In ASP, the reference points are set
as the four corners of a frame. as shown i n Fig. 1. One. two, three. or four reference points is used for translation, isotropic, affine. or perspective model. For every frame, the corresponding positions of the reference points in the reference frame, which are denoted as sprite points in Fig. I , are transmitted. I n MPEG-4, only half pixel precision
is
allowedfor the sprite
points. Wethink the po-
sitions of sprite points should be considered for GME in MPEG-4 ASP to avoid redundant computation for precision higher than half pixel, and GME can be applied directly in the sprite point domain rather than in motion parameter domain.In this paper, we first analyze GME in MPEG-4 ASP to find the optimal way to implement GME. Based on the analysis, in Section 3 and 4. a new algorithm suitable for hardware implemen- tation and the associated hardware architecture are proposed. The simulation and implementation results are shown in Section 5 . Fi-
nally, Section 6 concludes this paper.
2. ANALYSIS OF GLOBAL MOTION ESTIMATION AND COMPENSATION IN MPEG-4
In this section, GME is analyzed in MPEG-4 ASP to show its cod- ing efficiency and find the parameters for implementation.
Several standard sequences are tested. From the rate-PSNR curve, we find that GME sometimes has almost no coding gain as the results of 121; however, GME has benefits for some sequences when global motion of zooming and rotation are involved. One of the typical case of zooming are shown in Fig. 2, where the first
30 frames of Table Tennis in CIF format are used as test sequence.
/
El
... Figure 2 Rate-PSNR curve of sequence Table Tennis, wherecurves of no GMC and GMC with two, four, and six parameters motion models are shown.
When ASP Level 3 encoder is considered, it requires about 750 Kbps t o achieve 35 dB in PSNR with GMC and requires 810 Kbps without GMC. It is shown that GME can reduce 7.4% in bitstream size. Besides, the global motion information can also support other high level video signal processing applications, such as global mo- tion descriptor in MPEG-7, scene change detection, and video seg- mentation. Therefore, GME should he included in the implemen- tation of the MPEG-4 ASP encoder. On the other hand, the run-
time profile shows that GME takes 34.8% computation time. The large amount of runtime comes from the complex gradient-descent GME algorithm, many iterations, and differential operations with respect to all the motion parameters in each iteration. Hence hard- ware implementation of GME is necessary.
We also examine the performance of ASP with two, four, and six parameters motion models. The results show that the perfor- mance of four and six parameten are similar, as shown in Fig. 2. The reason may he that in most cases, only panning, tilting, rota- tion, and zooming camera motions occur between two successive frames, and four parameters are suitable for most applications. On the other hand, although in some cases, the performance of two parameters are similar to those of four and six parameters, in most cases, the performance is similar to ASP without GMC, as shown in Fig. 2. We think it is because that the DPCM compression scheme of local motion vectors can achieve the same performance of GMC when only translation camera motion occurs. For that reason, the number of motion parameters should be more than or equal to four, and four motion parameters (isotropic model) are selected in this paper.
3. PROPOSED HARDWARE-ORIENTED ALGORITHM
The three types of GME algorithms, including differential tech- nique, feature points based algorithms, and frame matching, all have some drawbacks for hardware implementation. The differ- ential algorithm (DA) contains many floating point operations
in
the differential value calculation, inverse matrix operations are in- volved in the linear regression, and the amount of memory ac- cess is large for the iterations.In
feature points based algorithms (FPBA), the computation load is much lower; however, floating points operations and inverse matrix operations are also contained in the linear regression, a large sorting element is required for fea- ture points selection, and the motion parameters are not as accurate as those of the other algorithms. In frame matching algorithms (FMA), the computation are more regular, and the floating pointcalculation in GMC Can be avoided with a predefined GMC preci- sion
[SI.
Although FMA is more suitable for hardware implemen- tation, the drawback is that the search step size is hard to decide, and the computation and the memory access amount is enormous. We proposed a new GME algorithm named as sprite point matching algorithm (SPMA). Since MPEG-4 transmits global mo- tion parameters with the locations of sprite points, as describe in Section I, the optimal sprite points are searched directly instead of finding the optimal motion parameters. For isotropic model. two sprite points, (zb,yb) and (z\,y;), are used, and the operations can be presented as the following equations.SAD(+,t;) = ICJYt) - RF(W(t,tb,t:))l, (1)
I E F
tb E search range,& E search range,
where C F denotes current frame, R F denotes reference frame,
t
= ( z , y ) . tb = (zb,yb),t\
= ( z i , y i ) , F is a set of all the pixels in the current frame, andW(t,tb,t:)
denotes the warped location of locationt
according to the sprite points.Since the computation complexity of SPMA is too large for hardware implementation because a GMC operation needs to he applied in whole frame for each candidate sprite point, a fast algo- rithm is proposed, which is named as iterative sprite point match- ing algorithm (ISPMA). In order to reduce the search range, FPBA with two motion parameters is applied first to give predicted sprite points.
In
the two-parameters FPBA, the feature points are first foundin current frame. The feature points are the points with higher Hessian values 161:
d*CF(z,y) d Z C F ( z , y ) ' C F ( x , y )
d z 2 d y Z
- ( d
dxdy)"
(3)In local motion estimation, a cross matching algorithm in an L-
neighborhood (typically L=4) around the feature point is applied
[61. The feature points with higher sum of absolute difference val- ues are dropped. After that, the two global motion parameters are just the average motion vectors, and the predicted locations of sprite points can be derived.
The operations of ISPMA is similar as (2) except for the way to search sprite points. The search scheme of ISPMA can be shown
in Fig. 3. Figure 3(a) shows the predicted sprite points A and B.
We fix point A and move point B
in
a predefined search range and find the optimal location B ' , as shown in Fig. 3(b). Then we fix point B' and find the optimal location A', as shown in Fig. 3(c). Next, the procedure is applied again with A' and B' as new predicted sprite points, and the search range can be reduced. After we find the optimal locations of sprite points in the integer pixel precision, a full search algorithm is employed to find the optimal locations in half pixel precision, as shown in Fig. 4, where 9 x 9 =81 candidate locations are searched. In many experiments, we find that 52 feature points and the search range of (3
x
3)-(3 x 3)-(1 x 1) can give accurate locations of sprite points.4. HARDWARE ARCHITECTURE
Based on the proposed algorithm, a hardware architecture for GME is proposed in this section. The block diagram of the proposed ar-
(a) (h) (C) Figure 3: Iterative sprite point matching algorithm.
Figure 4: Full search sprite points in the range [-OS, 0.51
chitecture is shown in Fig. 5. F'F'BA is executed in Hessian, Sorr- ing, M E , and Regression to generate predicted global motion pa- rameters. Then SPM P E Array (Sprite Point Matching Processing Element Array) can execute ISPMA GME to refine the predicted locations of sprite points and output the final results. An off-chip frame memory is required in this system to store the information of current frame and reference frame. The detailed architectures of these modules are described in the following subsections.
4.1. FPBA Part
Hessian calculates the Hessian value of each pixel in the current frame. The detailed architecture is shown in Fig. 6 , where a de- lay line architecture is used. Note that in order to spread feature points in the frame, the frame is divided into four pans, and I3 fea- ture points are chosen in each pan. So only delay lines with W/Z delay elements are required. Besides, most of the delay elements are implemented with two-port RAM to reduce the hardware cost. Sorting module then finds the pixels with the largest 13 Hessian values in each quarter frame as feature points.
Since local motion estimation will be applied at only 52 fea- ture points, only one processing element is sufficient. The archi- tecture is similar to other conventional motion estimation archi- tectures. The motion vectors are then accumulated in Regression, and a multi-cycle-shift-and-subtract divider can find the average motion vector as the global motion vector.
4.2. Sprite Point Matching PE Array
Figure 7 shows the detailed architecture of SPM PE Array. The SAD values of different locations of sprite points are calculated in different PES at the same time. In each PE, GMC frame is gen- erated, and the SAD between GMC frame and current frame is
calculated.
In
Fig. 7, GMC can generate the corresponding loca- tion of current pixel in the reference frame. The operationcan
be$---$J+J-w--
: I ... ++
+Figure 5: Overview of the proposed hardware architecture.
Figure 6: Architecture of Hessian value calculation. described in the following equations [SI:
+,U) =
CO
+
(C,x
I+
cz
x y)/C,, (4)U ( Z , Y ) = c3
+
(C4 x 5+
cs
x Y)/C6, ( 5 )5' = l + > Y ) l / S > (6)
Y' = l U ( G Y ) l / % (7)
where CO. CI, Cz, C3. C4, C S , 6 6 can
he
directly derived fromthe locations of sprite points and are constants in whole frame,
Ci = C5 and
CZ
= -C4 for GMC with four parameters, s is theprecision of warping and can be 2,4,8, or 16, CO is always power of two, (x,y) is current pixel, and (x',y') is the corresponding pixel in the reference frame. Since the divisors are all power of two. no divider is needed in G M C . Besides, the multiplication can he funher replaced with addition by the following equations [7].
u(0,O) =
CO,
(8) U(l:+1,Y) = U ( z , Y ) + C l / c S , ( 9 ) u ( l , Y f l ) = ~ ( ~ , y ) + c Z / C O . (10)For the function U(.), the same multiplication-free technique can
also be used. Ser Consranr first generates the constants
CO,
C3, C I / C ~ , andG/Ce
for each sprite point location. G M C then finds the corresponding location ( x ' , y') of current point ( x , y). Next, address generator (AG) derives the address to access the lo-cal
memory(LM),
which contains four banks and can output four values at the same time. Bi-linear interpolation is executed in In- rerpularion to find the pixel value. The SAD value between current frame and GMC frame is stored in SAD Regisrer. Finally, Compare Tree can find the minimum SAD values and get the optimal sprite point location. Note that an on-chip Reference Frame Dara Buffer is required for data-reuse, and the content of LMof all PES are the samein
order to simplify the memory access.5. SIMULATION RESULTS
5.1. Proposed Algorithm
The PSNR curve of the GMC frames are shown in Fig. 8, where GME in VM, F'F'BA with 52 and 800 feature points, and the pro- posed hybrid algorithm (F'PBAclSPMA) are compared. In Fig. 8(a); sequence Stefan in CIF formal is tested, which contains pan- ning and zooming camera motion. It shows that the performance of VM GME and the proposed one are very similar and hard to be
distinguished. The performance of FPBA is not stable and not ro- bust. The PSNR has vibration, which can be observed when play- ing back the global motion compensated sequence. The results of sequence Table Tennis is presented in Fig. 8(b), where a scene change occurs at frame 132, and a large zooming motion occurs at
Figure 7: Architecture of sprite point matching global motion es- timation PE array.
Table 1: Result of hardware implementation of GME for MPEG-4 ASPBL3.
Unit Gate count Internal memoly size(bit)
Hessian 3328 5632 Sorting 9918 0 LME 856 2016 Regression 999 0 SPM PE Array 9 PES 45819 3528 Set Constant 3612 0 Compare Tree 1584 0 FrmeBuffer 0 19712 Total 66116 30888
the first 30 frames. It is shown that the proposed algorithm has the same performance as VM GME and is robust lo scene change. 5.2. Hardware Implementation
The results of hardware implementation of the proposed hardware architecture for GME is shown in Table 1. The target is ASP@L3. that is, the input frame is in CIF format, The hardware architecture
is implemented and simulated with Verilog-HDL and synthesized with SYNOPSYS Design Compiler. We adopt AVANT! 0.35 p m cell library, and the target working frequency is 100 MHz. Nine PES are required in SPM P E Array in this clock rate. In Table I , the dominant module is SPM P E Array, and the total gate count is about h6K and the required internal memory size is about 31Kb, which is feasible in today's technology and is reasonable lo be integrated into MPEG-4 ASP encoders.
6. CONCLUSION
In this paper, the hardware implementation issues of global motion estimation (GME) in MPEG-4 Advanced Simple Profile is dis- cussed. Combining a feature points based algorithm and a novel it- erative sprite point matching algorithm, a hardware-oriented GME algorithm is proposed. The associated hardware architecture is
also proposed in this paper. For MPEG-4 A S P B W , the gate count is h6K, the internal memory size is 31Kb, and the working fre- quency is 100 MHz. This architecture is suitable to he integrated into MPEG-4 ASP encoder to provide GMWGMC ability.
m Im llD m m %a
r*".-
(b)
Figure 8: PSNR curve of VM GME, feature point based GME, and the proposed GME algorithms. (a) Stefan. (b) Table Tennis
7.
REFERENCES[ I ] MPEG Video Group,
AMENDMENT4:
Streaming Yideo Pro- file, ISOnECJTC 1/SC 29/WGll N3904. 2001.[2] A. Luthra,
R.
Gandhi, K. Panusopone, K. Mckoen,D.
Bay- Ion, and L. Wang, "Performance of MPEG-4 profiles used for streaming video,'' in Proc. of 2002 Workshop and Exhibition on MPEG-4,2002. pp. 103 -106.[31 F. Moscheni. F. Dufaux, and M. Kunt, "A new two- stage global/local motion estimation based on
a
back- groundlforeground segmentation,'' in Pmc. of Inremarional Conference on Acoustics, Speech. and Signal Processing 1995, 1995, pp. 2261-2264.[4] D. Adolph and R. Buschmann, "1.15Mbi~k coding of video signals including global motion compensation:' Signal Pro- cessing: Image Communication, vol. 3, no. 2. 1991.
The MPEG-4 Video Standard Verifi- cation Model version 18.0, ISO/IEC ITC USC 291WGll N3908,2001.
[6] A. Smolic, T. Sikora, and 1:R. Ohm, "Long-term global mo- tion estimation and its application for sprite coding, content description, and segmentation:' IEEE Transactions on Cii- cuils and Systems f o r Video Technology, vol. 9, no. 8, pp.
1227-1242, Dec. 1999.
I71 W. Badawy and M. Bayoumi, "A multiplication-free algo-
rithm and a parallel architecture for affine transformation." Journal of V U 1 Signal Processing. vol. 31, no. 2, pp. 173-
184, lune 2002.
[SI MPEG Video Group,