CHAPTER 5 ARCHITECTURE DESIGN FOR H.264/AVC INTRA CODING
5.3. A RCHITECTURE D ESIGN OF H.264/AVC I NTRA C ODING
5.3.6. CAVLC U NIT
The architecture of CAVLC is shown in Fig. 64. CAVLC encoding process can be divided into two phases, scanning phase and encoding phase. Input of CAVLC is four transformed coefficients per cycle. The scanning phase will skip the zero coefficients and only scans the nonzero one in the inverse zigzag scan order to speedup the encoding phase. Then, the data are sent to the corresponding lookup tables in parallel. These codes are buffered and concatenated to form the final bitstream.
Fig. 65. Memory Organization
5.3.7. Memory Organization
In the proposed architecture, two components have memories. The organizations of memories are shown in Fig. 65. Source buffer stores the input data 4 pixels row by row. Coefficient Buffer is divided into two parts to facilitate DC value access in Intra16x16 mode. By using Ping-Pong architecture, data input phase and entropy coding phase can be pipelined to improve the encoding throughput.
1086 cycles are spent for pipelined architecture as shown in Fig. 67. The performance of proposed architecture only needs about 117.28MHz to meet HDTV 720p (1280x720@30Hz) real-time application.
Fig. 66. Timing schedule of proposed intra coder.
Fig. 67. Timing schedule of proposed architecture
5.4. Implementation Results
To evaluate the accuracy and the efficiency of the proposed architecture, the design is implemented using the UMC 0.18µm 1P6M CMOS technology and the cell-based design flow. The chip has an area of 2.4x2.4 mm2 (pad limited) as shown in Fig. 68. The design can achieve 125 MHz at the worst-case. Thus, it can easily support 29.46M pixels/s still image encoding and real-time moving picture intra coding of HDTV 720p@30fps video application when clocked at 117.28MHz. Therefore, it is suitable for digital video or camera applications.
Table 9. List of gate count
Intra Predictor 3507
Q/IQ 22082
DCT(with DC register) 9985
IDCT(with DC register) 9836
Boundary Reconstruction Unit 15697
Cost Generation and Mode Decision Unit 10315
UVLC/CAVLC 11965 Controller 2781
Boundary Predictor Buffer 6465
Total 92633
Technology: UMC 0.18 µm 1P6M CMOS
Voltage:
1.8 V (Core) 3.3 V (I/O)
Die Size: 2.4×2.4 mm2
Core size: 1.28x1.28mm
SRAM: (all single port) Coefficient buffer
Source buffer
104 x 64 bits x 2 banks 96 x 32 bits x 1 bank
Fig. 68 Chip specification
Chapter 6 Conclusion
In this thesis, our contribution is in three parts. The first contribution is the deblocking filter architecture that can accelerate the deblocking process. The proposed two architectures not only save the memory size but also have higher speed. The idea is to rearrange the data flow and achieve higher data reusability.
The second contribution is the fast intra coding algorithm can reduce the computational complexity of intra 4x4 prediction. Six modes are required instead of nine modes in the full search method. The fast intra prediction algorithm can save 33% computational complexity with only about 1% bit-rate loss. The final contribution is the intra coding architecture can speed up the computation of intra frame coding. Proposed cost function has better quality and complex plane mode is skipped to save area. The prediction process is well scheduled to achieve high utilization. We hope that our research result can promote the convenience of human life.
[2] Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra,
“Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions on Circuits and Systems for Video Technology, July 2003
[3] Information Technology - Generic Coding of Moving Picture and Associated Audio Information: Video, ISO/IEC 13818-2 and ITU-T Recommendation H.262, 1996
[4] Video Coding for Low Bit Rate Communication, ITU-T Recommendation H.263, Feb. 1998.
[5] Information Technology - Coding of Audio-Visual Objects - Part 2: Visual, ISO/IEC 14496-2, 1999.
[6] A. Joch, F. Kossentini, H. Schwarz, T. Wiegand, and G.J. Sullivan,
"Performance comparison of video coding standards using Lagrangian coder control," in Proceedings of IEEE International Conference on Image Processing 2002, vol. 2, pp501-504.
[7] Y.-L. Lee and H. W. Park, “Loop filtering and post-filtering for low-bitrates moving picture coding,” Signal Processing: Image Commun., vol. 16, pp.
871–890, 2001.
[8] S. D. Kim, J. Yi, H. M. Kim, and J. B. Ra, “A deblocking filter with two separate modes in block-based video coding,” IEEE Trans. Circuits Syst.
Video Technol., vol. 9, pp. 156–160, Feb. 1999.
[9] P. List, A. Joch, J. Lainema, G. Bjøntegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614- 619, Jul. 2003.
[10] H.264/AVC reference software JM7.2, Jul. 2003
[11] Y.-W. Huang, T.-W. Chen, B.-Y. Hsieh, T.-C. Wang, T.-H. Chang, L.-G.
Chen, “Architecture design for deblocking filter in H.264/JVT/AVC,” Proc.
of Multimedia and Expo, vol. 1, pp. 693 –696, Jul. 2003.
[12] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264/ ISO/ IEC 14496-10 AVC), Mar.
2003.
[13] H.264/AVC reference software JM8.2, Jul. 2004
[14] Meng, B.; Au, O.C, “Fast intra-prediction mode selection for 4x4 blocks in H.264”in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal, 2003., vol. 3, 6-10 pp.III - 389-92 ,April2003
[15] Meng, B., Au, O.C., Chi-Wah Wong, Hong-Kwai Lam, “Efficient intra-prediction mode selection for 4x4 blocks in H.264” in Proc. of Int. Conf.
on Multimedia and Expo, 2003, vol. 3 , 6-9 Pages:III - 521-4, July 2003
[16] Feng PAN, Xiao LIN, Rahardja SUSANTO, Keng Pang LIM, Zheng Guo LI, Ge Nan FENG, Da Jun WU, and Si WU, "Fast Mode Decision for Intra Prediction," JVT-G013, 7th Meeting, Pattaya II, Thailand, 7-14 March, 2003.
[17] Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG ”Performance comparison: H.26L intra coding vs. JPEG2000” Klagenfurt, Austria, 22-26 July, 2002, JVT-D039
[18] T.-C. Wang, Y.-W. Huang, H.-C. Fang, and L.-G. Chen, “Parallel 4_4 2D transform and inverse transform architecture for MPEG-4 AVC/H.264,” in Proc. IEEE Int. Symp. Circuits and Systems, 2003, pp. 800–803.
國立台南市第一高級中學 (民國 85 年 9 月~民國 88 年 6 月) 國立交通大學電子工程學系 學士 (民國 88 年 9 月~民國 92 年 6 月)
國立交通大學電子研究所系統組 碩士 (民國 92 年 9 月~民國 94 年 6 月)
獲獎紀錄:
z 九十三學年度 大學院校積體電路設計競賽 (IC Contest) 研究所/大學部 標準單元式設計組(Cell-based) 優等
z Asia and South Pacific Design Automation Conference (ASP-DAC) 2005 Best Award of Student Design Contest
z 九十二學年度 大學院校矽智產設計競賽(IP Contest) Star Video Motion Estimation Engine QME
Soft IP 不定題組 特優
z 九十一學年度 殷之同電子實驗計畫獎學金
專題名稱:Automatic generation of Area-Effective Bit-Serial FIR Filters z 九十一學年度上學期(大四) 電子工程系書卷獎
z 九十學年度下學期(大三) 電子工程系書卷獎 z 九十學年度上學期(大三) 電子工程系書卷獎
Chao-Chung Cheng, Tian-Sheuan Chang, "Fast Three Step Intra Prediction Algorithm for 4x4 blocks in H.264," International Conference on Circuit and System (ISCAS) 2005
Chao-Chung Cheng, Tian-Sheuan Chang, "An Hardware Efficient Deblocking Filter for H.264/AVC," International Conference on Consumer Electronics (ICCE) 2005
Hao-Yun Chin, Chao-Chung Cheng, Yu-Kun Lin, and Tian-Sheuan Chang, "A Bandwidth Efficient Subsampling-based Block Matching Architecture for Motion Estimation," Asia and South Pacific Design Automation Conference (ASP-DAC) 2005
Chao-Chung Cheng, Yu-Jen Wang, Tian-Sheuan Chang, “A Fast Fractional Pel Motion Estimation Alogrithm for H.264/AVC,” The 16th VLSI Design/CAD symposium 2005