Chapter 4 Comparison Results and Chip Implementation
4.2 Chip Implementation and Comparison Results
Concerning the chip implementation of the proposed GE architecture, the cell-based design flow with Faraday standard cell library in UMC 90-nm CMOS process is adopted. The Synopsys Design-Compiler is used to synthesize the RTL design of the proposed architecture and the Cadence SOC-Encounter is adopted for automatic placement and routing (APR) and the Synopsys Prime-Power is used to measure the power consumption for the post-layout simulation.
Table 4.4 summarizes the chip characteristics of the proposed GE architecture and the corresponding chip layout is shown in Fig. 4.1.
Table 4.4: Chip characteristics of the proposed GE architecture
Power Supply 1.0V
Process Technology UMC 90 nm CMOS
Max. Clock 200 MHz
Max. Power 5.89 mW
Gate Count 195K
Core Area 0.58 mm2
Vertex Cache 1
Vertex Cache 2
PPU
Constant Memory
VPU
Fig. 4.1. Chip layout of the GE.
The same teapot benchmark is rendered with different subdivision levels including level-0, level-1, and level-2 as shown in Figs. 4.3 (a), (b) and (c), respectively. The power consumption for each subdivision level are measured and illustrated in Fig. 4.4.
(a) level-0 (b) level-1
(c) level-2
Fig. 4.2: Rendering result of different subdivision levels.
8.71 mw
6.92 mw
5.89 mw 100%
9.44%
67.62%
level-2 level-1 level-0
Fig. 4.3: Power profiling of different subdivision levels.
The comparison results between prior work and our work are summarized in Table 4.5. Compared with [25][26][27][28][29], the proposed GE has better power efficient index with 16.978 Mvertices/(smW). Moreover, using the proposed subdivision algorithm, the proposed GE can provide near-Phong shading quality.
Table 4.5: Comparison results among the existing work
*1: Assume hit rate is 50%.
*2: The core area is 2.164mmx2.797mm
ISSCC’04 [25]
JSSC’06 [26]
JSSC’07 [27]
ISSCC’07 [28]
JSSC’08
[29] This Work
Process (nm) 130 180 180 180 180 90
Frequency (MHz) 400 200 100 200 50 200
Polygon Rate
(Mvertices/s) 36 50 120 141 25*1/12.5 100*1/50
Power (mW) 250 155 157 86 8.6 5.89
Core Area (mm2) - 23 16 9.7 6.05 0.58
Power Efficiency
Mvertices/(smW) 0.144 0.323 0.764 1.64 2.907 16.978
Feature Graphics,
DSP Graphics Graphics Graphics Graphics, DSP
Graphics with scalable
quality hardware
support.
Chapter 5
Conclusion
In this work, a low complexity subdivision algorithm and a power efficient GE are presented. Five low complexity techniques including the triangle filtering scheme, the dual space subdivision, the setup variable sharing and the edge function recover scheme are proposed to reduce the computational complexity of the subdivision algorithm. The proposed geometry engine employs several techniques to optimize the power, area and shading quality. With the post-TnL vertex cache and the object space culling scheme, the redundant computation for transforming and lighting can be eliminated. With the proposed RDP, the area is reduced since the same set of PEs can be reconfigured for different mode operations. The dedicated hardware supports the scalable and near-Phong shading quality. Three different subdivision levels including level-0, level-1 and level-2 are supported. From the chip implementation results, the proposed geometry engine can achieve the power-efficiency of 16.978 Mvertices/mW.
Bibliography
[1] P. Cesar, P. Vuorimaa, and J. Vierinen, “A graphics architecture for high-end interactive television terminals,” ACM Trans. Multimedia Comput. Commun.
and Appil., vol. 2, no. 4, pp.343-357, Nov. 2006.
[2] B.-S. Liang, Y.-C. Lee, W.-C. Yeh, C.-W. Jen, "Index rendering:
hardware-efficient architecture for 3-D graphics in multimedia system," IEEE Trans. Multimedia, vol.4, no.3, pp. 343-360, Sep. 2002.
[3] H. Gouraud, “Continuous shading of curved surfaces,” IEEE Trans. Compt., pp.623-628, June 1971.
[4] A. Watt, “3D computer graphics,” 3rd Edition, Addison Wesley, 2000.
[5] A.T. Phong, “Illumination for computer generated pictures,” Communications of the ACM, vol. 18, no. 6, pp.311-317, June 1975.
[6] G. Bishop, and D. M. Weimer, “Fast Phong Shading,” Proc. Computer Graphics and interactive Technique, 1986, pp.103-106.
[7] A. A. Mohamed, L. S. Kalos, and T. Horváth, “Hardware implementation of Phong shading using spherical interpolation,” Periodica Polytechnica, vol. 44, Nos 3-4, 2000.
[8] T. Barrera, A. Hast, and E. Bengtsson, “Faster shading by equal angle interpolation of vectors,” IEEE Trans. Visualization and Computer Graphics, pp.217-223, Mar. 2004.
[9] K. Harrison, D. A. P. Mitchell, and A. H. Watt., “The H-test: a method of high speed interpolative shading,” Proc. New Trends in CG., 1988, pp.106-166.
[10] J. Pöpsel, and Ch. Homung, “Highlight shading lighting and shading in a PHIGS+PEX environment,” EUROGRAPHICS, 1989, pp.317-332.
[11] A. A. Mohamed, L. S. Kalos, G. Szijártó, T. Horváth, and T. Fóris, “Quadratic interpolation in hardware Phong shading and texture mapping,” SCCG’01, April, 2001, pp.181-188.
[12] T. Barrera, A. Hast, and E. Bengtsson, “Fast near Phong-quality software shading,” WSCG’06, January, 2006, pp.109-115.
[13] S. Bischoff, L.P. Kobbelt, and H.P. Seidel, “Toward hardware implementation of Loop subdivision,” Proc. SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, 2000, pp.41-50.
[14] Y. Cho, U. Neumann, and J. Woo, “Improved specular highlights with adaptive shading,” Proc. of CG. International, June, 1996, pp.38-46
[15] Y. Kamen, and L. Shirman, “Triangle rendering using adaptive subdivision,”
IEEE Comput. Graph. Applal., Mar. 1998.
[16] T. Y. Sheu, L. D. Van, T. R. Jung, C. W. Lin, and T. W. Chang, "Low complexity subdivision algorithm to approximate Phong shading using forward difference," ISCAS 2009, pp. 2373-2376.
[17] J. McCormack and R. McNamara, “Tiled polygon traversal using half-plane edge functions,” Proc. SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, 2000, pp.15-21.
[18] M. Olano, and T. Greer, “Triangle scan conversion using 2D homogeneous coordinates,” Proc. SIGGRAPH/EUROGRAPHICS workshop on Graphics Hardware, August, 1997, pp.89-95.
[19] K.-C., C.-H. Yu and L.-S. Kim, "Vertex cache of programmable geometry processor for mobile multimedia application," ISCAS 2006.
[20] C.-Y. Han, Y.-H. Im and L.-S. Kim, "Geometry engine architecture with early
backface culling hardware," Computers & Graphics, pp.415-425, 2005.
[21] Antonio G.M. Strollo and Davide De Caro, "Booth Folding Encoding for High Performance Squarer Circuits," IEEE Trans. CAS II: Analog and Digital Signal Processing, vol.50, no.5, pp.250-254, May 2003.
[22] K. H. Abed and R. E. Siferd, “CMOS VLSI implementation of a low-power logarithmic converter,” IEEE Trans. Computers, vol. 52, no. 11, pp. 1421-1433, Nov. 2003.
[23] K. H. Abed and R. E. Siferd, “CMOS VLSI implementation of a low-power antilogarithmic converter,” IEEE Trans. Computers, vol. 52, no. 9, pp.
1221-1228, Nov. 2003.
[24] B.-G. Nam, H.-Kim and H.-J. Yoo, “A low-power unified arithmetic unit for programmable handheld 3-D Graphics Systems,” IEEE J. Solid-State Circuits, vol. 42, no. 8, Aug. 2007.
[25] F. Arakawa et al., “An embedded processor core for consumer applications with 2.8 GFLOPS and 36 Mpolygons/s FPU,” IEEE ISSCC, Feb. 2004, pp.
334–335.
[26] J. Sohn et al., “A 155-mW 50-Mvertices/s graphics processor with fixed-point programmable vertex shader for mobile applications,” IEEE J. Solid-State Circuits, vol. 41, no. 5, pp. 1081–1091, May 2006.
[27] C. H. Yu, K. Chung, D. Kim and L.-S. Kim, "An energy-efficient mobil vertex processor with multithread expanded VLIW architecture and vertex caches,"
IEEE J. Solid-State Circuits, vol. 42, no. 10, Oct. 2007.
[28] B.-G. Nam, J. Lee, K. Kim, S.-J. Lee, and H.-J. Yoo, “A 52.4 mW 3-D graphics processor with 141 Mvertices/s vertex shader and 3 power domains of dynamic voltage and frequency scaling,” ISSCC 2007, pp. 278-603.
[29] S.-Y. Chien, Y.-M. Tsao, C.-H. Chang and Y.-C. Lin, “An 8.6 mW 25
Mvertices/s 400-MFLOPS 800-MOPS 8.91 mm2 multimedia stream processor core for mobile applications,“ IEEE J. Solid-State Circuit, vol. 43, issue. 9, pp.
2025-2035, Sept. 2008.
Publication List
International Conference Papers
[1] T. Y. Sheu, L. D. Van, T. R. Jung, C. W. Lin, and T. W. Chang, ”Low complexity subdivision algorithm to approximate Phong shading using forward difference,”
in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May. 2009, pp. 2373-2376, Taipei, Taiwan.
[2] T. R. Jung, L. D. Van, T. Y. Sheu, C. W. Lin, W. C. Fang, “Design of multi-mode depth buffer compression for 3D graphics system,” in Proc. IEEE Int. Conf.
Multimedia and Expo. (ICME), July 2008, pp. 789-792, Hannover, Germany.
[3] T. R. Jung, L. D. Van, W. C. Fang, T. Y. Sheu, "Reconfigurable depth buffer compression design for 3D graphics system," in Proc. Int. Conf. MUE., Apr.
2008, pp. 470-474, Busan, Korea.
Biography
Ten-Yao Sheu was born in Changhua, Taiwan, R.O.C, in 1983. He received the B.S. degree from National Pingtung University of Education (NPUE), Pingtung, Taiwan, in 2006, and the M.S degree from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2009, all in computer science. His research interests are VLSI information processing algorithm and architecture for 3D graphics.