6.1 Conclusion
In this thesis, we considered implementing a real-time MPEG-4 object-based video en-coder on the dual-core PAC platform.
Firstly, we focused on the correct of encoding the bitstream, and the coded bitstream have been verified with the reference software of MPEG-4, MoMuSys. Then, we analyzed the statistics of the MPEG-4 object-based video encoder on PC. Therefore, we had an initial understand of the encoding flow and the critical part of computation. According to the analysis, we designed our dual-core structure and implemented the DSP part on the PACDSP platform. The dual-core results was verified with the single core platform.
After the implementation was verified, we further analyzed the encoding algorithm and coding flow to find if there was any removable computation. Based on our analy-sis, we optimized the program sequence to reduce the computation complexity without too much quality loss or bit-rate increased. In addition, we also utilized several gen-eral software optimization techniques, such as static rescheduling, loop-unrolling, and software-pipelining to reduce the stalls.
Finally, the optimization results were discussed. For the best case, stefan, which has the smallest VOP size, we can encode the MPEG-4 video data near 43 frames and 35 frames per second for intra and inter encoding, respectively. And the program size was about 29KB, which was smaller than the instruction cache size. In addition, the used data
size of each coder was also under the limit of memory provided on PACDSP. Therefore, no cache missing problem happened in our implementation. In conclusion, the performance and quality of our implementation of MPEG-4 object-based video encoder on PAC system was competitive.
6.2 Potential Future Work
There are several improvements and extensions that can be considered in the future:
• Data structure refinement
The data structure is very important to the implementation on DSPs. If we can design the more efficient data structure, the memory accesses can be significantly reduced, and the performance also can be improved.
• Add some popular fast motion estimation algorithm
Motion estimation is the most computational part in MPEG-4 video encoder. How-ever, many fast motion estimation algorithm has been proposed, and used popularly.
We consider to add some fast motion estimation algorithm for flexibility.
• Dual-core loading balance
We can find the estimated frame rate in previous chapter, and the bottleneck is still the execution time of ARM part. If we share more computation to PACDSP part, the performance will be improved by the advantage of dual-core implementation.
• Add other MPEG-4 tools
To simplify our implementation, the error-resilience tool in MPEG-4 simple profile is neglected. However, this tool is very important when the bitstream is transmit-ted through real channels. In the future, we need to implement the techniques of error-resilience, such as resynchronization, data partition, and reversible variable length coding (RVLC). Moreover, the other advanced profiles of MPEG-4 video compression technique can be implemented to extend the capability of PACDSP.
• Verify the ISS simulator on PAC system
We have done the Dual-core implementation on ARM926EJ-S platform, the bit-stream have been verified with the ADS single core.Since some coding constraints are not included on the ISS, we need to do some modification fitting the PACDSP chips. To verify the ISS simulator result, more program condition need to testing.
Bibliography
[1] SoC Technology Center, Industrual Technology Research Institute, PACDSP v3.0
— Software Developer’s Bible — Vol. 1 Software Developer’s Guide. Doc. no.
PACDSP3S0001, Feb. 2006.
[2] SoC Technology Center, Industrual Technology Research Institute, PACDSP v3.0 — Software Developer’s Bible — Vol. 2 Instruction Set Manual. Doc. no.
PACDSP3S0002, May. 2006.
[3] SoC Technology Center, Industrual Technology Research Institute, PACDSP v3.0
— Software Developer’s Bible — Vol. 3 Programming Constraints and Optimized Guide. Doc. no. PACDSP3S0003, Apr. 2006.
[4] ISO/IEC 14496-2:2001, Information Technology — Coding of Audio-Visual Objects
— Part 2: Visual. July 2001.
[5] A. Puri and A. Eleftheriadis, “MPEG-4: an object-based multimedia coding stan-dard supporting mobile applications,” Mobile Networks Applic., vol. 3, pp. 5–32, 1998.
[6] A. Ebrahimi and C. Horne, “MPEG-4 natural video coding — an overview,” Signal Processing Image Commun., vol. 15, pp. 365–385, 2000.
[7] MPEG-4 Video Group, “MPEG-4 video verification model version 18.0,” doc. no.
ISO/IEC JTC1/SC29/WG11 N3908, Pisa, Jan. 2001.
[8] http://www.tnt.uni-hannover.de/project/eu/momusys.
[9] T.S. Chang, C.S. Kung, and C.W. Jen, “A simple processor core design for DC-T/IDCT transform,” IEEE Trans. Circuits Syst. Video Technology, vol. 10, no. 3 , pp.
439–447, Apr. 2000.
[10] Cheng-Ta Chiang, “Software implementation of MPEG-4 Object-based Video En-coder on PACDSP platform,” M.S. thesis, Department of Electronics Engineering, National Chiao Tung University, Hsinchu,Taiwan, R.O.C., July 2007.
[11] Chung-Yen Tsai, “Software implementation of MPEG-4 video decoder on PACDSP platform,” M.S. thesis, Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan, R.O.C., July 2006.
[12] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Ap-proach, 3rd ed. San Francisco: Morgan Kaufmann Publishers, 2003.
[13] S. Sriram and C. Y. Hung, “MPEG-2 video decoding on the TMS320C6X DSP architecture,” in IEEE Signal Systems Computer Conf., vol. 2, Nov. 1998, pp. 1735–
1739.
[14] C. E. Fogg, “Survey of software and hardware VLC architectures,” in Proc. SPIE Image and Video Compression, vol. 2186, May 1994, pp. 29–37.
[15] R. Prasad and R. Korada, “Efficient implementation of MPEG-4 video encoder on RISC core,” IEEE Trans. Consumer Electronics, vol. 49, pp. 204–209, Feb. 2003.
[16] N. I. Cho and S. U. Lee, “Fast algorithm and implementations of 2-D discrete cosine transform,” IEEE Trans. Circuit Syst., vol. 38, pp. 297–305, Mar. 1991.
[17] B. G. Lee, “A new algorithm to compute the discrete cosine transform,” IEEE Trans.
Acoust. Speech Signal Processing, vol. 32, no. 6, pp. 1243–1245, Dec. 1984.
[18] C. Y. Hung and P. Landman, “A compact IDCT design for MPEG video decoding,”
in Proc. IEEE Workshop Signal Processing Systems, Nov. 1997.
[19] G. Plonka and M. Tasche, “Reversible integer DCT algorithms,” preprint, Gerhard-Mercator-Univ. Duisburg, 2002.
[20] Y. Chen and P. Hao, “Integer reversible transformation to make JPEG loseless,” in Int. Conf. Siganl Processing, Beijing, China, Sept. 2004, pp. 835–838.
[21] Texas Instuments, TMS320C64x Image/Video Processing Library — Programmers Reference, Literature no. SPRU023B, Oct. 2003.
[22] N. Ventroux, J. F. Nezan, H. Raulet, and O. Deforges, “Rapid prototyping for an optimized MPEG-4 decoder implementation over a parallel heterogenous architec-ture,” in Proc. Int. Conf. Multimedia Expo, vol. 3, July 2003, pp. 417–420.
[23] K. Ramkishor and U. Gunashree, “Real time implementation of MPEG-4 video de-coder on ARM7TDMI,” in Proc. Int. Symp. Intelligent Multimedia Video Speech Processing, May 2001, pp. 522–526.
[24] J. H. Kuo, J. L. Wu, J. Shiu, and K. L. Huang, “A low-cost media-processor based real-time MPEG-4 video decoder,” in IEEE Int. Conf. Consumer Electronics, June 2002, pp. 272–273.
[25] J. T. J. VanEijndhoven et al., ”TriMedia CPU64 architecture,” in IEEE Int. Conf.
Computer Design, 1999