• 沒有找到結果。

Chapter 4 Entropy Decoder

4.2 CAVLC decoder

4.2.5 Performance Analysis

For the decoder, Table 13 shows the average decoding cycles for one macroblock.

All the sequences in this simulation are in QCIF size and use all intra encoded to show the worst case situation. Due to the efficient decoding scheme, the proposed design can save up to 76% of cycles for Qp=28 case when compared to the previous one. For fair comparison, we implement the method proposed in and also include the required merging cycles in the previous design.

Table 13 Comparisons of Average decoding processing cycles

Qp Sequence Akiyo Foreman Stefan Mobile News

proposed 38 53 124 174 58

In this chapter, we describe the analysis and design of CAVLC decoder and UVLC decoder for H.264 video coding. The UVLC decoder takes the characteristic of exp-Golomb code to predetermine the code length. With the known code length, the hardware can be implemented in combinational logic instead of the tables in other VLC decoder designs. The proposed CAVLC design quickly skips redundant zero decoding cycles. For the CAVLC decoder, zero information is explored and works

together with partial multi-symbol decoding by employing the properties of the CAVLC algorithm. The CAVLC decoder design can reduce the decoding cycle by up to 76% for the QP=28 case when compared with others

Chapter 5

IMPLEMENT RESULT

5.1 memory controller

Table 14 List of gate count for memory controller

component gate count

write address generator 1366

read address generator 1797

write command generator 215

read command generator 215

arbiter and data masker 696

control unit 526

total 4832

The proposed data mapping aware memory controller is designed by Verilog HDL and implemented in TSMC 0.18um technology. When synthesizing at 133 MHz, the total gate count is about 4k. Table 14 lists the synthesis result of gate count for each component. The two address generators consume most of the total area. The available SDRAM operating rate under different CAS latency is as listed in Table 15.

When setting CAS latency to 3, the SDRAM can work at higher operating frequency.

However, more stall cycle is demanded due to the longer CAS latency. In our design, we set CAS latency to 2 and the maximum of available operating rate is 133 MHz. In this case, the frequency is fast enough to support the required bandwidth for real time decoding in our target formats.

Table 15 Maximum SDRAM operating rate under different CAS latency

Table 16 List of gate count for entropy decoder

Component Gate count

CAVLC decoder 11724

Coeff_token table 1764

Run_before table 737

Total_zero table 419

Level decoder 1312

Bit-stream shifter 1732

Datapath (merging unit, level buffer, coeff. buffer and zero detection unit)

5638

Control unit 105

UVLC decoder 1845

The proposed entropy decoder is designed by Verilog HDL and implemented in TSMC 0.18um technology. When synthesizing at 125 MHz, the gate count is about 11k for CAVLC decoder and 1.8k for UVLC decoder. Most of the CAVLC decoder area is spent on the datapath which includes the merging unit, level buffer, coefficient buffer and zero detection unit. Table 16 lists the detailed synthesis result of each component. The hardware cost comparison is listed in Table 17. For fair comparison, the hardware cost in [16] includes the level and run merging unit which is quiet the same as the proposed design. The design in [17] needs 6100 gates but without reordering and merging logic.

Designs [16] [17] Proposed Total gate count

(excluding memory)

9943 6100 11724

Process 0.18um 0.25um 0.18um

Frequency 125MHz 125MHz 125MHz

Table 17 List of hardware comparison of CAVLC decoder

Chapter 6

CONCLUSION

The contribution of this thesis can be divided in two parts. In chapter 3, a data mapping aware H.264/AVC memory controller is presented to improve the external memory bandwidth. To reduce the overhead cycles caused by the access characteristics of the SDRAMs, we analysis the statistics of video sequences and develop two methods to ease the memory access latency. For intra request optimization, we find a data mapping between image position and memory location to reduce the occurrence of row miss as possible as we can. For inter request optimization, we check the required row opening operation before access and skip the unnecessary row active and closing operation. As a result, the miss rate is about 1.8%

for 525SD video format with QP = 20 and the required bandwidth is 46.96MBps for real-time decoding of luma and chroma motion compensation.

In chapter 4, an entropy decoder with the capacity of partial multi-symbol decoding, zero skipping and skipped merging of run and level is proposed. The synthesis result shows that the gate count is about 11k for CAVLC decoder and 1.8k for UVLC decoder while the CAVLC decoder design can reduce the decoding cycle by up to 76% for the QP=28 case when compared with others designs. This design is very suitable for the applications with the demand of low power or high video quality

In the future work, the main idea of memory controller still works for advanced off-chip memories such as DDR or DDR2. The multi-channel technique and more complicated arbitration policy can work together with the proposed memory controller to enhance the performance for advanced coding tools like B-frame, weighted prediction and direct mode. The hardware cost of entropy decoder can be further reduced with optimized implementation of VLC tables. We sincerely hope that these research results can promote the improvement of video application and convenience of human life as well.

REFERENCES

[1] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, ITU-T Recommendation H.264 and ISO/IEC 14496-10 AVC, in Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-G050, Mar. 2003 [2] Generic Coding of Moving Picture and Associated Audio Information – Part 2:

Video, ITU-T Recommendation H.262 and ISO/IEC 13818-2, Draft International Standard, Nov. 1994

[3] Coding of Audio-Visual Objects – Part 2: Visual, ISO/IEC 14496-2, International Standard: 1999/Amd1:2000, Jan. 2000

[4] A.Puri, X.Chen, A. Luthra, “Video Coding Using the H.264/MPEG-4 AVC Compression Standard,” Signal Proc. Image Communication, vol. 19, pp. 793-849, 2004

[5] Coding of Moving Picture and Associated Audio for Digital Storage Media up to about 1.5Mbits/s, ISO/IEC 11172-2, International Standard, Nov. 1992

[6] D.Marpe, H.Schwarz, T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” IEEE Transaction on Circuits and Systems for Video Technology, vol. 13, pp.620 – 636, July 2003

[7] Micron Technology, Inc. MT48LC8M32B2P 256 Mb SDRAM [Online].

Available : http://www.micron.com/products/dram/sdram/

[8] J.-H. Li, N. Ling, “Architecture and bus-arbitration schemes for MPEG-2 video decoder,” IEEE Transaction on Circuits and Systems for Video Technology, vol. 9, pp.727 – 736, Aug. 1999

[9] N. Ling, N.-T. Wang, D.-J. Ho, “An efficient controller scheme for MPEG-2 video decoder,” IEEE Transaction on Consumer Electronics, vol. 44, pp.451 – 458, May 1998

[10] H. Kim, I.-C. Park, “High-performance and low-power memory-interface

architecture for video processing applications,” IEEE Transaction on Circuits and Systems for Video Technology, vol. 11, pp. 1160 – 1170, Nov. 2001

[11] S.-I. Park, Y. Yi, I.-C. Park, “High performance memory mode control for HDTV decoders,” IEEE Transaction on Consumer Electronics, vol. 49, pp.1348 – 1353, Nov.

2003

[12] H.-Y. Kang, K.-A. Jeong, J.-Y. Bae, Y.-S. Lee, S.-H. Lee, “MPEG4 AVC/H.264 decoder with scalable bus architecture and dual memory controller,” proc.

International Symposium on Circuits and Systems, vol. 2, pp. II - 145-8, May 2004 [13] J. Zhu, L. Hou, W. Wu, R. Wang, C. Huang, J.-T. Li, ”High Performance

Synchronous DRAMs Controller in H.264 HDTV Decoder”, proc. International

Conference on Solid-State and Integrated Circuits Technology, vol. 3, pp. 1621 – 1624, Oct. 2004

[14] S.-Z. Wang, “A Flexible Motion Compensation Memory Organization for Dual-standard Video Decoder,” A Thesis Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical & Computer Engineering National Chiao Tung University for the Degree of Master in Electronics Engineering

& Institute of Electronics

[15] D.Wu, W.Gao, M. Hu, Z. Ji, “An Exp-Golomb encoder and decoder architecture for JVT/AVS,” proc. International Conference on ASIC, vol. 2, pp. 910 – 913, Oct.

2003

[16] H.-C. Chang, C.-C. Lin, J.-I. Guo, “A Novel Low-Cost High-Performance VLSI Architecture for MPEG-4 AVC/H.264 CAVLC Decoding,” proc. ISCAS, pp. 6110 - 6113, 2005.

[17] D. W, G. Wen, M. H., and Z. Ji “A VLSI architecture design of CAVLC decoder,” proc. International Conference on ASIC. pp. 962-965, 2003.

[18] Y.-H. Moon, G.-Y. Kim, and J.-H. Kim, "An efficient decoding of CAVLC in H.264/AVC video coding standard," IEEE Transactions on Consumer Electronics, vol.

51, pp. 933-938, Aug. 2005

[19] G..-S. Yu, T.-S. Chang, “A zero-skipping multi-symbol CAVLC decoder for

MPEG-4 AVC/H.264,” proc. ISCAS, pp. 4, May 2006

[20] Joint Video Team reference software JM8.6

[21] S.-M. Lei, M.-T. Sun, “An entropy coding system for digital HDTV

applications,” IEEE Transaction on Circuits and Systems for Video Tech. vol. 1, no. 1.

pp. 147-155, Mar. 1991

作 者 簡 歷

姓名: 余國亘 籍貫: 台灣桃園

學歷:

國立武陵高級中學 (民國 86 年 09 月~民國 89 年 06 月)

國立交通大學電子工程學系 (民國 86 年 09 月~民國 89 年 06 月)

國立交通大學電子所系統組 (民國 86 年 09 月~民國 89 年 06 月)

著作:

[1] G..-S. Yu, T.-S. Chang, “A zero-skipping multi-symbol CAVLC decoder for MPEG-4 AVC/H.264,” proc. ISCAS, pp. 4, May 2006

相關文件