In this thesis, we propose a macroblock-level pipelining architecture for a H.264/MPEG-4 AVC decoder based on both platform-based design methodology and application specific circuit design methodology. With platform design methodology, the software procedures and hardware modules retain a high degree of reusability.
Hence, it shortens the design cycles so that we can quickly integrate our design into industrial application. With application specific circuit design methodology, we conduct task partitioning and scheduling in the macroblock-level to enhance the overall decoding throughput. The software parts control the branching data flow and the hardware accelerators speed up the regular and computationally intensive modules.
In the hardware acceleration designs, we present a platform based bus-interleaved architecture for deblocking filter in H.264/MPEG-4 AVC. We have shown that performing the data transfer and filtering operation in parallel can significantly reduce the processing latency. Moreover, classifying macroblock filtering mode can avoid redundant data transfer so as to efficiently use bus bandwidth.
Moreover, we utilize bus-interleaved IQ-IDCT and deblocking filter to perform data transfer, inverse transforming, reconstruction and deblocking filtering in parallel. As compared to traditional shared memory architecture, we have shown that we can remove intermediate buffer and achieve the same performance.
Based on the dedicated accelerators and macroblock-level pipelining, our proposed decoder achieves significant improvement in speed using both software and
bus-interleaved designs. By taking advantage of the interleaved processing method, we can implement bus-interleaved MC and intra prediction. Our goal is to parallel process the data transfer and functional computation for all dedicated accelerators. Hence, the interleaved processing reduces the processing latency.
Besides, we pass the intermediate data to next accelerator to avoid the usage of intermediate buffer for macroblock-level pipelining. Thus, our proposed decoder represents the non-buffered memory architecture among all the bus-interleaved accelerators to achieve low cost and high performance.
2. Processor-based chip implementation
Our system is processor-based that contains an ARM processor to conduct software operation and hardware control behavior. As compared to VLSI ASIC circuit, it is more challengeable to implement a processor-based chip. Our goal is to realize system-on-chip implementation. Several expected features of our chip list as follows.
− Processor-based configuration.
− Low cost.
− Low power consumption.
− High processing ability.
− Flexibility.
− IP reusability.
[1] Video codec for audio visual services at 64 kbit/s, ITU-T Rec. H.261, 1993.
[2] Video Coding for Low Bit Rate Communication, ITU-T Rec. H.263, 1998.
[3] Information technology -- Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s -- Part 2: Video, ISO/IEC
11172-2, 1993.
[4] Information Technology - Generic Coding of Moving Pictures and Associated Audio Information: Video, ISO/IEC 13818-2 and ITU-T Rec. H.262, 1996.
[5] MPEG-4 Overview, ISO/IEC JTC1/SC29/WG11 N4668, 2002.
[6] T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, "Overview of the H.264/AVC Video Coding Standard," IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 560-576, July. 2003.
[7] http://www.arm.com, ARM Ltd
[8] ARM968E-S Technical Reference Manual, ARM Ltd., 2004.
[9] AMBA™ Specification Rev 2.0, ARM Ltd, 1999.
[10] S. H. Wang, W. H. Peng, Y. He, G. Y. Lin, C. Y. Lin, S. C. Chang, C. N.
Wang, and T. Chiang, "A platform-based MPEG-4 advanced video coding (AVC) decoder with block level pipelining", IEEE Pacific Rim Conf. on Multimedia, vol. 1, pp. 51-55, Dec. 2003.
[11] S. H. Wang, W. H. Peng, Y. He, G. Y. Lin, C. Y. Lin, S. C. Chang, C. N.
Bus-interleaved Architecture for Deblocking Filter in H.264/MPEG-4 AVC”, IEEE Trans. on Consumer Electronics, vol. 51, pp. 249-255, Feb. 2005.
[14] T. C. Chen, Y. W. Huang, C. H. T, T. W. Chen, and L. G. Chen, "A 1.3 TOPS H.264/AVC single-chip encoder for HDTV applications", IEEE Int'l Solid-State Circuits Conf., San Francisco, USA, Feb. 2005
[15] C. C. Cheng and T. S. Chang, "An hardware efficient deblocking filter for H.264/AVC", IEEE Int'l Conf. on Consumer Electronics, pp. 235-236, Las Vegas, Jan. 2005.
[16] Y. W. Huang, T. W. Chen, B. Y. Hsieh, T. C. Wang, T. H. Chang, and L. -G.
Chen, "Architecture design for deblocking filter in H.264/JVT/AVC", IEEE Int'l Conf. on Multimedia and Expo. vol. 1, pp. 693-696, July 2003.
[17] T. M. Liu, W. P. Lee, T. A. Lin, and C. Y. Lee, "A memory-efficient deblocking filter for H.264/AVC video coding", IEEE Int'l Symposium on Circuits and Systems, pp. 2140 – 2143, May 2005.
[18] M. Sima, Y. Zhou, and W. Zhang, "An efficient architecture for adaptive deblocking filter of H.264/AVC", IEEE Trans. on Consumer Electronics, vol.
50, no. 1, pp. 292-296, Feb. 2004.
[19] V. Venkatraman, S. Krishnan, and N. Ling, "Architecture for de-blocking filter
[20] G. Q. Zheng and L. Yu, "An efficient architecture design for deblocking loop filter", Picture Coding Symposium, San Francisco, USA, Dec. 2004
[21] D. Marpe, H. Schwarz, and T.Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp.
620–644, July 2003.
[22] Bergeron, C., Lamy-Bergot, C., “Soft-input decoding of variable-length codes applied to the H.264 standard,” Multimedia Signal Processing, pp. 87 – 90, Oct. 2004.
[23] H. S. Malvar, A. Hallapuro, M. Karczewicz, and Louis Kerosfsky, “Low complexity transform and quantization in H.264/AVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 598–603, July
2003.
[24] M. T. Orchard and G. J. Sullivan, “Overlapped Block Motion Compensation:
An Estimation-Theoretic Approach,” IEEE Trans. Image Processing, pp.
693-699, Sept. 1994
[25] S. D. Kim, J. Yi, and J. B. Ra, “A Deblocking Filter with Two Separate Modes in Block-Based Video Coding.” IEEE Trans. Circuits Syst. Video Technol., pp.
156-161, Feb.1999.
[26] G. Martin, "A design chain for embedded systems," IEEE Computer Magazine, vol. 35, pp. 100-103, March, 2002.
[27] T. Givargis and F. Vahid, "Platune: a tuning framework for system-on-a-chip platforms, " IEEE Trans. on Computer-Aided-Design of Integrated Circuits and Systems, vol. 21, pp. 1317 – 1327, Nov. 2002.
vol. 12, no. 8, pp. 688-699, Aug. 2002.
[31] K. Ramkishor and V. Gunashree, "Real time implementation of MPEG-4 video decoder on ARM7TDMI," Proc. IEEE International Symposium on Intelligent Multimedia, Video, and Speech Processing, pp. 522-526, May 2001.
[32] M. Zhou and R. Talluri, "DSP-based real time video decoding," Proc. IEEE International Conference on Consumer Electronics, pp. 296-297, June 1999.
[33] Lukowicz, P, “Design of an efficient shared memory architecture using hybrid opto-electronic VLSI circuits and space invariant optical buses”, Massively Parallel Processing Using Optical Interconnections, pp. 231-238, Oct. 1996.
[34] T.C. Wang, Y.W. Huang, H.C. Fang, and L.G. Chen, “Parallel 4/spl times/4 2D transform and inverse transform architecture for MPEG-4 AVC/H.264”, Circits and Systems (ISCAS'03), vol. 2, pp. 800-803, Thailand, Taiwan, April, 2003
[35] Integrator™/LM-XCV600E+ and Integrator™/LM-EP20K600E+ User Guide, ARM Ltd, 2001
[36] ARM Developer Suite Getting Started, ARM Ltd, 2001.
[37] Xilinx-Platform FPGA Virtex-II datasheet, Xilinx.Ltd, 2000.
[38] Joint Mode Reference Software version 6.0, ARM Ltd, 2001.