Chapter 5 Experimental Results and Analysis
5.2 Hardware Design Evaluation
5.2.2 FPGA Based Evaluation Platform
We design a verification platform based on ARM-based platform. Basically, our platform is constructed with the configuration in Figure 41. FPGA based test environment. Our ARM emulation board mainly includes two parts, core module and logic module. In the core module, there are ARM966 CPU, embedded SRAM (1 Mbytes), and external memory interface. On the other hand, the dedicated accelerators are implemented on the logic module which is a FPGA (Filed-programmable Gate Array). Moreover, ARM board employs the AHB bus interfaces to communicate the core module and logic module. Besides, our ARM integrator baseboard employs JTAG (Joint Task Action Group) interface to connect with an ARM MultiICE. The MultiICE connects to a host commuter to conduct the communication between computer and ARM board. In the FPGA environment, we run the codec of MPEG-4 encoder to verify our BBME design. The BBME is implemented on FPGA, and ARM CPU takes charge of the remaining parts of MPEG-4 encoder. The architecture of BBME has passed the verification.
Embedded
Architecture of ARM platform based MPEG-4 Encoder with BBME
Figure 41. FPGA based test environment
Chapter 6
Conclusion and Future Works
In this thesis, we presented a low power ME design for bi-directional search. The proposed ME design contains two main parts, IME and MD-SME. For low power application, a BBME architecture that can process the forward and backward search in parallel is presented. Such a design can save twice memory access of current frame and then share the operation engine to keep hardware as busy as possible. For P-frame search, this parallel search architecture divides the original search into two sub-groups of partial P-frame search to double the processing throughput. For the hardware design, we proposed three new features to improve the hardware efficiency including MBPPU, hardware efficient LV2 design and integration of 16×16 and 8×8 searches. Besides the optimization of IME, we integrate the MD module into SME to avoid two loops processing of MD and SME to save power. In MD, this work adopts a new line-based algorithm to reduce the longer latency in the original two-dimension and avoid hardware idling. In SME, we adopt an architecture that processes 3 search locations in parallel to reduce memory access and power consumption. To enable the parallel processing of IME and MD_SME, the system pipelining is designed to enhance the throughput and avoid hardware idling. This work completes one bi-directional MB search in 147 cycles with 131 kilo gate count and 51 kilo bits on-chip memory using TSMC 0.18μm technology. The power consumption for CIF 30fps is 11.8 mW.
In the further works, we focus on integrating BBSME with H.264/MPEG-4 AVC standard [20], which supports multiple reference frames. With some minor modifications, this work can be extended to ME with any combination of multiple forward or backward reference frames for throughput improvement
Bibliography
[1] Information technology - coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s - part 2: video, ISO/IEC 11172-2, 1993.
[2] Information technology - generic coding of moving pictures and associated audio information: video, ISO/IEC 13818-2 and ITU-T Rec. H.262, 1996.
[3] MPEG-4 overviews, ISO/IEC JTC1/SC29/WG11 N4668, 2002.
[4] Video coding for low bit rate communication, ITU-T Rec. H.263, 1998.
[5] Video codec for audio visual services at 64 kbit/s, ITU-T Rec. H.261, 1993.
[6] “ISO/IEC 14496-5:2001 Final Committee Draft”, MPEG01/N4025.
[7] S. Kalra and M. N. Chong, “Bi-directional motion estimation via vector propagation,”
IEEE Trans. on Circuits and Syst. for Video Technol., vol.8, pp. 976-987, Dec. 1998.
[8] Y. Keller and A. Averbuch, “Fast motion estimation using bi-directional gradient method,” IEEE Trans. on Image Processing, vol.13, pp. 1042-1054, Aug. 2004.
[9] J.-H. Luo, et al., “A novel all-binary motion estimation (ABME) with optimized hardware architectures,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 12, pp. 700-712, Aug. 2002.
[10] S.-H. Wang, et al., “Platform based design of all binary motion estimation with bus interleaved architecture,” IEEE International Symposium on VLSI Design, Automation and Test, pp. 241-244, April 2005.
[11] W. E. Lynch, “Bidirectional motion estimation based on P-frame motion vectors and area
overlap,” IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, vol. 3, pp.
445-448, March 1992.
[12] S. Kozu and S. Kulkarni, “A new technique for block-based motion compensation,”
IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pp. 217-220, April
1994.
[13] J. Ge and G. Mirchandani, “A new hybrid block-matching motion estimation algorithm,”
IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 241-244, May
2002.
[14] M. A. Elgamel, et al., “Systolic array architectures for full-search block matching motion estimation,” third Int’l workshop on Digital and Computational Video, pp. 108-115, Nov.
2002.
[15] Y.-K. Lai, “A memory efficient motion estimator for three step search block-matching,”
IEEE Trans. on Consumer Electronics, vol. 47, pp. 644-651, Aug. 2001.
[16] J.-F. Shen, et al., “A novel low power full search block matching motion estimation design for H.263+,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 11, pp.
890-897, July 2001.
[17] Y.-W. Huang, et al., “Global elimination algorithm and architecture design for fast block matching motion estimation,” IEEE Trans. on Circuits and Syst. for Video Technol., vol.
14, pp. 898-907, June 2004.
[18] M. Miyama, et al., “A sub-mW MPEG-4 motion estimation processor core for mobile video application,” IEEE Journal of Solid State Circuit, vol. 39, pp. 1562-1570, Sept.
2004.
[19] Jae Hun Lee, et al., “A fast multi-resolution block matching algorithm and its LSI architecture for low bit-rate video coding,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 11, pp. 1289 – 1301, Dec. 2001.
[20] T. Wiegand, et al., “Overview of the H.264/AVC video coding standard,” IEEE Trans. on Circuits Syst. Video Technol., vol. 13, pp. 560-576, July. 2003.
簡 歷
戴世炘 : 民國七十年生於台灣省新竹市。民國九十三年畢業於國立清華大學電機工程學 系,之後進入國立交通大學電子工程所攻讀碩士學位。從事多媒體視訊壓縮以及數位電 路設計方面的研究。
Shih-Hsin Tai was born in HsinChu in 1981. He received the BS degree in Electronic Engineering, National Tsing Hua University (NTHU), HsinChu, Taiwan in 2004. His current research interests are video compression and digital circuit design.