Chapter 4 Simulation Results and Chip Implementation
4.2 Chip Implementation
Concerning the chip implementation, the cell-based design flow with Artisan standard cell library is adopted and the proposed architecture has been implemented in TSMC 0.18-um CMOS process. The Synopsys Design Compiler is used to synthesize the RTL design of the proposed architecture, the Cadence SOC Encounter is adopted for
placement and routing (P&R) and the Synopsys PrimePower is used to measure the power consumption for each mode after post-layout simulation. Table 4.3 summarizes the chip characteristics of the proposed architecture.
Table 4.3. Chip characteristics of the proposed architecture.
Active Chip Area 1.13 x 1.13 mm2
Gate Count 97, 246
Max Clock Frequency 100 MHz
Process Technology TSMC 0.18-um CMOS
Power Consumption (mW) @ 100MHz
One-Plane Type 22.75 Two-Plane Type
(rising/vertical/horizontal)
51.76/56.25/71.9
Two-Plane Type (falling) 57.63 Uncompression Mode 38.63 Power Consumption
(mW) @ 66.7MHz
One-Plane Type 15.18 Two-Plane Type
(rising/vertical/horizontal)
34.52/37.51/57.26
Two-Plane Type (falling) 38.43 Uncompression Mode 25.76
Fig. 4.6. Chip layout of the proposed architecture.
Chapter 5
Conclusion and Future Work
In this work, the reconfigurable algorithm for depth buffer compression is presented. This proposed algorithm not only supports the 1-bit HA, 2-bit DDPCM schemes as well as 7-bit DDPCM scheme, but also handles one-plane and one-plane type compressions. In addition, different compression schemes can be applied in the vertical and horizontal parts in a tile. There are totally 11 compression modes adaptively applied according to different 3D scenes in this proposed compression algorithm. In two-plane type, there are four kinds of combination cases, including rising, vertical, horizontal, and falling cases, concerned in the presented algorithm.
For 8x8 tile size with 16-bit depth values under the teapot benchmark, the proposed reconfigurable algorithm can achieve CR of 1.75 on average and improve 13.6% and 31.6% compared with the HA and DDPCM compression methods, respectively. For 8x8 tile size with 16-bit depth values under the Stereoscopic polygons benchmark, the proposed reconfigurable algorithm can achieve CR of 1.74 on average and improve 21.7% and 38.1% compared with the HA and DDPCM compression methods, respectively.
Furthermore, the proposed reconfigurable and power efficient depth buffer compression architecture has been verified and implemented in TSMC 0.18-um CMOS process. The core consists of 97,246 transistors, and its area is 1.13 um2. It operates at 100 MHz with maximum power consumption of 38.63 mW in uncompression mode,
22.75 mW in one-plane type, 51.76/56.25/71.9 mW in two-plane type, including rising, vertical, and horizontal cases, and 57.63 mW in two-plane type, including falling cases, at supply voltage of 1.8V.
For the future work, the ranges of horizontal and vertical parts will be discussed for better compression performance.
Bibliography
[1] DVB Multimedia Home Platform (MHP) Specification 1.1, TS 102 812, Nov.
2001.
[2] T. Heinonen, A. Lahtinen and V. Hakkinen, “Implementation of three-dimensional EEG brain mapping,” Computers and Biomedical Research, vol.32, pp. 123–131, 1999.
[3] R.-W. Woo, S. Choi, J.-H. Sohn, S.-J. Song Y.-D. Bae, and H.-J. Yoo, “A Low-Power 3-D Rendering Engine With Two Texture Units and 29-Mb Embedded DRAM for 3G Multimedia Terminals,” in IEEE Journal of Solid-State Circuits, vol. 39, no. 7, pp. 1101-1109, July 2004.
[4] R. Woo, S. Choi, J.-H. Sohn, and H.-J. Yoo, “A 210-mW Graphics LSI Implementation Full 3-D Pipeline With 264 Mtexels/s Texturing for Mobile Multimedia Applications,” in IEEE Journal of Solid-State Circuits, vol. 39, no. 2, pp. 358-367, February 2004.
[5] J.-H. Sohn, J.-H. Woo, M.-W. Lee, H.-J. Kim, R. Woo, and H.-J. Yoo, “A 155-mW 50-Mvertices/s Graphics Processor With Fixed-Point Programmable Vertex Shader for Mobile Applications,” in IEEE Journal of Solid-State Circuits, vol. 41, no. 5, pp. 1081-1091, May 2006.
[6] C.-W. Yoon, R. Woo, J. Kook, S.-J. Lee and H.-J. Yoo, “An 80/20-MHz 160-mW Multimedia Processor Integrated With Embedded DRAM, MPEG-4 Accelerator, and 3-D Rendering Engine for Mobile Applications,” in IEEE Journal of Solid-State Circuits, vol. 36, no. 11, pp. 1758-1767, November
2001.
[7] Y.-H. Park, S.-H. Han, J.-H. Lee, and H.-J. Yoo, “A 7.1-GB/s Low-Power Rendering Engine in 2-D Array-Embedded Memory Logic CMOS for Portable Multimedia Sysyem,” in IEEE Journal of Solid-State Circuits, vol.
36, no. 6, pp. 944-955, June 2001.
[8] R. Woo, C.-W. Yoo, J. Kook, S.-J. Lee and H.-J. Yoo, “A 120-mW 3-D Rendering Engine With 6-Mb Embedded DRAM and 3.2-GB/s Runtime Reconfigurable Bus for PDA Chip,” in IEEE Journal of Solid-State Circuits, vol. 37, no. 10, pp. 1352-1355, October 2002.
[9] B.-G. Nam, H. Kim, and H.-J. Yoo, “A Low-Power Unified Architecture Unit for Programmable Handheld 3-D Graphics Systems,” in IEEE Journal of Solid-State Circuits, vol. 42, no. 8, pp. 1767-178, August 2007.
[10] D. Kim, K. Chung, C.-H. Yu, C.-H. Kim, I. Lee, J. Bae, Y.-J. Kim, J.-H. Park, S. Kim, Y.-H. Park, N.-H. Seong, J.-A. Lee, J. Park, S. Oh, S.-W. Jeong, and L.-S. Kim, “An SoC With 1.3 Gtexels/s 3-D Graphics Full Pipeline for Consumer Applications,” in IEEE Journal of Solid-State Circuits, vol. 41, no.
1, pp. 71-84, January 2006.
[11] T. Akenine-M¨oller and Jacob Ström, “Graphics for the masses: a hardware rasterization architecture for mobile phones,” in ACM Transactions on Graphics, vol. 22, issue 3, pp. 801-808, July 2003.
[12] H.-C. Shin, J.-A. Lee, and L.-S. Kim, “A Cost-Effective VLSI Architecture for Anisotropic Texture Filtering in Limited Memory Bandwidth,” in IEEE Transactions on Very Large Scale Integration(VLSI) Systems, vol. 14, no. 3,
pp. 254-267, March 2002.
[13] S. Fenney, “Texture compression using low-frequency signal modulation,” in Graphics Hardware, SIGGRAPH/EUROGRAPHICS, pp. 84-91, 2003.
[14] J. Ström and T. Akenine-Möller, “iPACKMAN: high-quality, low-complexity texture compression for mobile phones,” in Graphics Hardware, SIGGRAPH/EUROGRAPHICS, pp. 63-70, 2005.
[15] S. Morein., “Method and apparatus for efficient clearing of memory,” U.S.
Patent 6 421 764, July 16, 2002.
[16] J. DeRoo, S. Morein, B. Favela, M. Wright, “Method and apparatus for compressing parameter values for pixels in a display frame,” U.S. Patent 6 476 811, Nov. 5, 2002.
[17] J. Van Dyke, J. Margeson, “Method and apparatus for managing and accessing depth data in a computer graphics system,” U.S. Patent 6 961 057, Nov. 1, 2005.
[18] T. Van Hook, “Method and Apparatus for Compression and Decompression of Z Data,” U.S. Patent 6 630 933, Oct. 7, 2003.
[19] B.-S. Liang, Y.-C. Lee, W.-C. Yeh, and C.-W. Jen, “Index rendering:
hardware-efficient architecture for 3-D graphics in multimedia system,” in IEEE Transactions on Multimedia, vol. 4, no. 2, pp. 343-360, June 2002
[20] S. Morein, M. Natale, ”System, method, and apparatus for compression of video data using offset values,” U.S. Patent 6 762 758, July 13, 2004.
[21] J. Hasselgren, T. Akenine-Möller, “Efficient depth buffer compression,” in Graphics Hardware, SIGGRAPH/EUROGRAPHICS, pp. 102-110, 2006.
[22] S. Morein, “ATI Radeon HyperZ technology,” in Hot3D Proc. ACM SlGGRAPH/Eurographics Workshop on Graphics Hardware, Aug. 2000.
[23] C.-H. Chen and C.-Y. Lee, “Two-level hierarchical Z-buffer with compression technique for 3D graphics hardware,” in The Visual Computer, Springer, vol. 19, no. 7-8, pp. 467-479, Dec. 2003.
[24] C.-H. Yu and L.-S. Kim, “A hierarchical depth buffer for minimizing
memory bandwidth in 3D rendering engine: depth filter,” in Proc. ISCAS'03, May 2003, pp.II-724- II-727.
[25] Per Wennersten, “Depth buffer compression,” M.S. thesis, Dept. Computer Science and Communication, Royal Institute of Technology, Stockholm, Sweden, 2007.
[26] M.-H. Choi, W.-C. Park, Francis Neelamkavil, T.-D. Han, and S.-D. Kim,
“An effective visibility culling method based on cache block,” IEEE Trans.
Computers, vol. 55, no. 8, pp. 1024–1032, Aug. 2006.
[27] N. Greene, M. Kass, and G. Miller, “Hierarchical Z-buffer visibility,” in Proc.
of SIGGRAPH‘93, Jul. 1993, pp. 231–238.
[28] C.-H. Yu and L.-S. Kim, “An adaptive spatial filter for early depth test,” in Proc. IEEE ISCAS’04, May 1994, pp. II-137- II -40.
[29] Y.-M. Tsao, C.-L. Wu, S.-Y. Chien, and L.-G. Chen, “Adaptive tile depth filter for the depth buffer bandwidth minimization in the low power graphics systems,” in Proc. IEEE ISCAS’06, May 2006, pp. 5023-5026.