• 沒有找到結果。

Performance Analysis and Implementation Results

4.3 Coding Speed Estimation

After our optimization, estimation of the encoding speed in our implementation is listed in Table 4.2. The four images with same size but different manner are tested and repre-sent quite similar results. The format of each image is 512 × 512 pixels and the color domain is RGB 24bits, then the whole sub-pixels in one image contains512 × 512 × 3 = 786432. In our measure, four tested images are encoded to wdp files with lossless mode and consuming time are also list in Table 4.2. The measure clock cycle is 10 ns, and the calculated throughput in terms of pixel/cycle shows that our implementation can achieve more than 1 pixel/cycle. The performance comparison with related works is described. In [9], JPEG XR encoding process is decomposed into (1) PCT/POT, (2)Quantization, (3) Prediction, (4) adaptive scanning, and (5) entropy coding. These five stages work in a pipeline manner. The throughput of this design is choked in the PCT/POT module and achieve 0.80 pixel/cycle. But this design process the entropy result and flexbits into bit-streams with CPU, seems no described any hardware architecture about Packetizer. In design [8], encoding process is decomposed into three main stages which are same to ours. Base on [8], [9], the process of third stage is decomposed into three phases, which work in a pipeline manner. This can contribute to the reduction of the timing of critical path to 1/3, and performance of this design is 112 fps for 4:4:4 CIF format at 62.5 MHz, which is equivalent to 0.54 pixel/cycle for one component.

The comparison of performance is summarized in Table 4.3. The throughput is listed for one component for the case of YUV 4:4:4. In [8] and [9],although the entropy coding is pipelined by three phases, the encoding process is bottlenecked by the entropy coding.

In our proposed optimization, throughput is improved to 2 times than these related works.

Therefore, our proposed architecture can achieve higher performance faster than other related words.

Our proposed JPEG XR encoder is designed by using Verilog-HDL to evaluate the present architecture. The architecture is synthesized by Synposys Design Compiler with 0.18 um CMOS standard cell library. The result is summarized in Table 4.4. The synthesis use 100MHz as the target frequency and our report about gate area is summarized in the

Table 4.2: Benchmark of JPEG XR encoder

Pic Name Lena.bmp Baboon.bmp Peppers.bmp F16.bmp

Resolution 512 × 512 512 × 512 512 × 512 512 × 512 Total Pixels(YUV444) 786,432 786,432 786,432 786,432

WDP File Size(Bytes) 454K 591K 493K 395K

Encoding Time(ns) 7,359,835 7,198,545 7,233,145 7,244,795

Throughput(Pixel/Cycle) 1.069 1.092 1.087 1.085

Table 4.3: Performance comparison among related works and the proposed architecture Architecture Throughput(Pixel/Cycle) PS

[9] 0.54

[10] 0.80 No Packetizer

[11] 1.58 Only PCT/POT

ours 1.00

column named gate counts. In the column of SRAM , the sizes of SRAMs for each module are also summarized. The result shows that the gate count of the designed JPEG XR encoder is 235,377. The number of SRAMs is required by992 × 3channels, which the predict buffer is configured as480 × 3channels when input image horizontal size is 1920 pixels.

Table 4.4: Gate count summarization of the proposed architecture

Paper [9] [10] [11] Ours

Frequency 62.5 MHz (0.18um) 125 MHz (0.13um) 250 MHz (90nm) 100 MHz (0.18um)

Bus Width 32 bits 14 bits 16 bits

Module name Gate count SRAM Gate count SRAM Gate count SRAM Gate count SRAM PCT/POT

316,898 256 × 3 90,692 256 × 2

100,500

1640

73,340

256 × 3

Quantization (POT1.2) (only PCT)

Prediction 81,980 256 × 3

6,726 120 × 7 None 62,029 256 × 3

480 × 3 480 × 3

CBP Unknown None 1,946

Adaptive scan 105,323 41,990 None 63,009

Entropy coding None 32,069

Packetizer None None 2,806

Control Unit 1,866 178

Top 506,167 992 × 3 142,157 1,352 235,377 992 × 3

Chapter 5 Conclusion

In this paper, we propose a novel and faster hardware architecture of JPEG XR, which is a new image coding standard and have advanced compression. One three-stage pipeline lossless JPEG XR encoder with YUV 4:4:4 was designed to support next generation HDR display. In previous architectures, the encoding throughput is limited in entropy coding stage because it was implemented to process coefficients according to the gathered statis-tics of running Macroblock.

We generalized the characteristic of Normalization(Update ModelBits) and took ad-vantage of reduction of Levels. Our propose pipeline controller can optimal the forward step of the encoding to decrease un-necessary data processing. We could safely pipeline all the encoding processes including the entropy coding and achieves higher throughput than those of related works. In contrast to the complete and similar related architecture [9], our propose structure is twice as fast.

However, detection of finish in each processing stage and asymmetric pipeline exe-cution time still cause the holding of processing unit and bring down the coding speed.

Making higher record length of pipeline buffers or adaptive buffer controller may over-come this bottleneck remains as a future work.

Bibliography

[1] Microsoft Corporation, HD Photo specification version 1.0, Nov. 2006.

[2] S. Srinivasan, C. Tu, S. L. Regunathan, and G. J. Sullivan, “HD Photo: a new image coding technology for digital photography,” in Proc. SPIE, vol.6696, Aug. 2007.

[3] Microsoft Corporation, “HD Photo device porting kit,” Nov. 2006.

[4] ISO/IEC 15444-1: “Information technology X JPEG 2000 image coding system X part 1: core coding system.”,2002.

[5] H. S. Malvar, “Biorthogonal and nonuniform lapped transforms for transform coding with reduced blocking and ringing artifacts,” IEEE Transactions on Signal Process-ing, vol. 46, pp.1043 V- 1053, Apr. 1998.

[6] Maalouf, A.; Larabi, M.-C. “Low-complexity hierarchical lapped transform for lossy-to-lossless image coding in JPEG XR / HD Photo” Image Processing (ICIP), 16th IEEE International Conference on Digital Object Identifier,pp.5 - 8,2009.

[7] S. Groder, Modeling and systhesis of the HD Photo compression algorithm, Masters thesis, Rochester Institute of Technology, Aug. 2008.

[8] C.-H. Pan, C.-Y. Chien, W.-M. Chao, S.-C. Huang, and L.-G. Chen, “Architecture design of full HD JPEG XR encoder for digital photography applications,” IEEE Transactions on Consumer Electronics, vol. 54, pp.963 V- 971, Aug. 2008.

[9] C.-Y. Chien, S.-C. Huang, C.-H. Pan, C.-M. Fang, and L.-G. Chen, “Pipelined arith-metic encoder design for lossless JPEG XR encoder,” in Proc. of 13th IEEE

Inter-[10] Hattori, K.; Tsutsui, H.; Ochi, H.; Nakamura, Y., “A High-Throughput Pipelined Ar-chitecture for JPEG XR Encoding” Embedded Systems for Real-Time Multimedia, IEEE/ACM/IFIP 7th Workshop on Digital Object Identifier, pp.9 - 17, 2009.

[11] Sheng-Wei Fan, Jia-Wai Chen, and Jiun-In Guo, LOW BANDWIDTH HD1080@60FPS JPEG-XR TRANSFORM DESIGN, in VLSI Design, Automation, and Test (VLSI-DAT), Apr 2012.

[12] L.V. Agostini, I.S. Silva, and S. Bampi, “Pipelined Entropy Coders for JPEG Com-pression,” Integrated Circuits and Systems Design, Proceedings 15th Symposium, pp. 9 - 14, Sept. 2002

相關文件