• 沒有找到結果。

Chapter 1 Introduction

1.2 Thesis Organization

The rest of the thesis is organized as follows. The proposed subdivision algorithm and the corresponding complexity analysis are described in Chapter 2. In Chapter 3, the proposed GE architecture is presented. The comparison results and chip implementation are addressed in Chapter 4. Last, a brief statement concludes the presentation of this thesis.

Chapter 2

Proposed Low Complexity Subdivision Algorithm

In this chapter, a low complexity subdivision algorithm to approximate Phong shading is proposed. To reduce the redundant memory accesses, the forward difference technique is used to subdivide triangles in the proposed algorithm. Since the forward difference technique is numerical instable, there may be rasterization anomalies on the rendered objects. Hence, an edge function recovery scheme is proposed to remove the rasterization anomalies. In the subdivision-based approximate Phong shading algorithm, the increased number of triangles becomes a potential problem to the computation and power consumption. In order to reduce the complexity of the proposed algorithm, the dual space subdivision scheme, triangle filtering scheme and the triangle setup variable sharing scheme are also presented. The proposed algorithm and schemes are described in detail in the following subsections.

2.1 Subdivision Using Forward Difference

Forward difference [13] method is widely used to evaluate the polynomial function.

Herein, use it to reduce the memory accesses for triangle subdivision. An example is illustrated in Fig. 2.1. To subdivide the triangle Δ VaVbVc in Fig. 2.1 (a), the intermediate vertices: Vab, Vbc, Vca are computed. Then these new vertices together with

the original vertices will be packed and new triangles are generated as: Δ VaVabVca, Δ VabVbcVca, Δ VabVbVbc and Δ VcaVbcVc. These new triangles will be output for next-stage processing. The forward difference method is used to compute the intermediate vertices. The first step is to compute the difference vectors dx

and dy in horizontal and vertical direction using Eq. (2.1) and Eq. (2.2).

S

, where NS= 2L denotes the number of the segments on each edge of the original triangle and L is a non-negative integer. Without loss of the generality, we set the NS = 2 as shown in Fig. 2.1(a).

(b) Subdivision using forward difference

Va

Vb Vc

(a) Subdivided four triangle Va

Fig. 2.1. Illustration for subdivision using forward difference.

Once the difference vectors are computed, the intermediate vertices can be

x ab

ca V d

V

 (2.4)

x b

bc V d

V

 (2.5)

Computing the intermediate vertices using the forward difference method is more efficient than other methods because generating one intermediate only needs one memory access to store the vertex. Compared with the conventional recursive-based subdivision algorithms [10][13][14][15][16], the forward difference method is stack free and hence the number of memory accesses can be reduced. In other words, the power can be alleviated. However, the subdivision algorithm using forward difference would result in the rasterization anomaly where pixels are lost on the rendered object. As shown in Fig. 2.2(a), (b), (c), (d), the empty pixels on the teapot, pawn, Venus, and couch are the lost pixels. The cause of the anomaly is the numerical instability of subdividing the triangle using the forward difference scheme. An example is illustrated in Fig. 2.3, where two adjacent triangles are subdivided using forward difference. In Fig.

2.3 (a), Vm denotes one intermediate vertex on the sharing edge of two triangles. It can be obtained from subdividing either the left triangle or the right triangle if the calculation has no error. In Fig. 2.3 (a), the vertex Vc is the intermediate vertex in the subdivided left triangle and is computed from the vertex Vb using the difference vector

dx

twice. The vertex Vc has the same coordinate as the vertex Vm if the calculation has no error. However, the calculation has the quantization error such that the vertex Vc has different coordinate from the vertex Vm. For the same reason, in the right triangle of Fig.

2.3 (a), the vertex Vd computed from vertex Va with forward difference vector dy

has different coordinate from the vertex Vm. As a result, the small triangles defined by vertex Vc and Vd respectively are not adjacent to each other. Fig. 2.3 (b) shows the

rasterization result of the sharing edge. Since the pixels are lost on the sharing edge after rasterization, the rasterization anomaly occurs.

(a) Teapot (b) Pawn

(c) Venus (d) Couch Fig. 2.2. Examples of rasterization anomaly.

Fig. 2.3. Illustration of the rasterization anomaly.

2.2 Edge Function Recovery Scheme

In order to remove the rasterization anomaly, a recovery scheme based on the edge function method is proposed. The edge function method [16] is used in some raster engine to decide whether a pixel is in the triangle. The edge function is a line equation through the two vertices of the triangle edge. For example, in Fig. 2.4 (a), the edge function Eab of the left triangle defined by vertices Va and Vb is expressed in Eq. (2.6), where (xa, ya) and (xb, yb) are the coordinates of vertex Va and Vb.

0

ab ab

ab

ab(x,y): A x B y C

E (2.6)

where Aab(ya-yb), Bab(xb-xa) and Cabxayb-xbya.

The other two edge functions Ebc and Eca can also be similarly derived as follows.

0

bc bc

bc

bc(x,y): A x B y C

E (2.7)

where Abc(yb-yc), Bbc(xc-xb) and Cbcxbyc-xcyb.

(b) Rasterization result (a) After subdivision

d

x

d

y

V

a

V

b

V

m '

V

a

V

c

0

ca ca

ca

ca(x,y): A x B y C

E (2.8)

where Aca(yc-ya), Bca(xa-xc) and Ccaxcya-xayc.

+,+,+

+,+,-+,+,+

+,+,+

+,+,+

P1

P2

(a) Before recovery (b) After recovery

Va:(xa, ya)

Ve

Vb:(xb, yb)

Vc:(xc, yc)

Vd:(xd, yd)

Eab(x,y) Ebc(x,y)

Eca(x,y)

Ead(x,y)

E'ca(x,y) E'ad(x,y) Vc

Vd

Fig. 2.4. Illustration of edge functions.

To test whether a pixel is in a triangle, the coordinate of the pixel is substituted to three edge functions. If the signs of the three calculation result are all positive, the pixel is regarded as an internal point in the triangle. For example, in Fig. 2.4 (a), the pixel P1 inside the blue triangle has three positive signs of all the edge functions Eab, Ebc and Eca.

As demonstrated in Fig. 2.3 (a), the intermediate vertices Vc and Vd of the two triangles have different coordinates. Therefore, they define two different edge functions Eca andEad, respectively. The different edge functions Eca andEad are shown in Fig. 2.4 (a). During rasterization, the pixel, for example, P1 is regarded as an internal pixel of the left triangle because it locates in the blue region which is the positive region for all the edge functions Eab, Ebc and Eca. Therefore, P1 will be rendered correctly. The pixel, for example, P2 in the green region has negative value for both the edge functions Eca and Ead and is regarded as outside of both the triangles. As a result, the pixels in the green region will be discarded from the pipeline and not be rendered. Therefore, the

rasterization anomaly occurs. To eliminate the anomaly, the edge function Eca derived from the left triangle and the Ead derived from the right triangle must be the same. As illustrated in Fig. 2.4 (b), the pixels inside the green region in Fig. 2.5(a) are located at one of the triangles because E’ab and E’ad are the same.

To obtain the same edge function derived, it is improper to use the coordinate of the vertex Vc and Vd for the calculation in Fig. 2.3 (a). Therefore, an edge function recovery scheme is applied to correct edge function calculation. The proposed scheme takes the advantage of linear property of line equation and computes the edge functions for the generated triangles. In Fig. 2.5 (a), a triangle is subdivided into four triangles.

After subdivision, the edge functions of the small triangles can be computed in the following steps.

Step 1: Compute the edge functions: Eab, Ebc, and Eca of the original triangle using Eqs. (2.6), (2.7) and (2.8).

Step 2: Compute the constant difference values: ∆Cab, ∆Cbc and ∆Cca in Eqs. (2.9), (2.10), and (2.11). The slopes of the three edge functions are expressed in the following.

)

of small triangles in Fig. 2.5 with the use of the computed original edge functions and the difference values. For example, Ekj can be computed using Eq. (2.12).

Va

Vb Vc

Vi Vk

Vj Eai

Ekj

Vi Vk

Vj

Eai

Ekj

Ebj

Eka

Eik

Eji

Ejc

Eck

Eib

E

ab

E

ca

E

bc

Va: ( xa, ya)

Vc:( xc, yc) Vb: ( xb, yb)

Fig. 2.5. Illustration of computing the edge functions for small triangles.

0 : kjkjkj

kj A x B y C

E (2.12)

, where AkjAab, BkjBab, CkjCab Cab. The constant term Ckj can be derived from the constant term Cab of the edge function Eab by adding the difference value ∆Cab

in Eq. (2.9). The other edge functions can be computed in the similar behavior. Finally, the small triangles can be rendered with these edge functions. By the proposed method, the derived edge functions on the sharing edge of any adjacent triangles are the same.

Therefore, the rasterization anomaly can be eliminated. The rendering results using the proposed edge function recovery scheme are shown in Fig. 2.6 (a), (b), (c), (d).

(a) Teapot (b) Pawn

(c) Venus (b) Couch Fig. 2.6. Rendering results with the proposed edge function recovery scheme.

In Eq. (2.6), evaluating one edge function requires three subtractions and two multiplications. For a subdivided triangle with Ns segments on each edge, there are total 3NS edge functions to be computed and computation requires 3NS(2 muls + 3 subs) = 6NS muls + 9NSsubs = 6NS muls + 9NSadds (subtraction is regarded as addition). The proposed recovery scheme computes each edge function for the subdivided triangle by adding one difference values. Therefore the computation complexity can be reduced to 3(2 muls + 3 adds) + 3(2 muls + 1 add) + (3NS - 3)(1 add) = 12 muls + (3NS + 9) adds.

Thus, the edge function recovery scheme implies an efficient method for computing the

edge functions of subdivided triangles.

2.3 Dual Space Subdivision Scheme

In the geometry engine, a sequence of transforms is applied to the vertices. A flow chart of the transforms is shown in Fig. 2.7.

Modelview Transform

Fig. 2.7. Flow chart of the transforms in the geometry engine.

The modelview transform transforms the vertex from object space to eye space by multiplying a 4x4 modelview matrix below.

In the projection transform, the eye space coordinate is transformed to clip space by multiplying a 4x4 projection matrix below.



normalized device coordinate (NDC) of each component in the range of [-1, 1] can be transformed to the window (screen) coordinate.



The conventional subdivision-based algorithm subdivides the triangles in the object space or the eye space. As illustrated in Fig. 2.8, the subdivision is performed at the early stage of the pipeline. Because the subdivision generates a large number of vertices, theses vertices bring overhead to the computation and the power consumption to the later stages of pipeline. To reduce the complexity, the dual space subdivision is

Fig. 2.8. Data flow of eye space subdivision.

ModelView Transform

Projective Transform

Perspective Division

Viewport Transform

Lighting

Screen-space coordinate Eye-space coordinate

Eye-space normal Eye-space

Subdivision Screen-space

Subdivision

Fig. 2.9. Data flow of dual space subdivision.

As illustrated in Fig. 2.9, the subdivision of the proposed scheme is performed after the viewport transform of the pipeline. It subdivides both the coordinates in eye space and window space. The eye space coordinate is required for point-light calculation and the screen space coordinate is used for edge function calculation and other geometry operations. By skipping these transforms including projection transform, perspective division and viewport transform, the computational complexity is remarkably reduced.

The complexity analysis of the eye space subdivision of a single triangle is given in Table 2.1. The left column lists the operations of subdivision and the corresponding complexity is listed in the right column. NG is defined as the number of the generated intermediate vertices during subdivision. After the triangle is subdivided, there are (NG+3) vertices including the original three vertices. First, the triangle is subdivided in eye space. Each step of the subdivision algorithm involves two vector-additions for eye coordinate (xE, yE, zE) and normal vectors (xN, yN, zN) with total six additions. Therefore, the addition complexity of subdivision is 6(NG+2) additions where two is the number of steps to calculate the difference vectors. After subdivision, all (NG+3) vertices will be transformed by projection transform, perspective division and viewport transform. As described in this subsection, the projection matrix is a 4x4 matrix and the computational complexity of the projection transform is equal to 16(NG+3) muls + 12(NG+3) adds. The perspective division for a vertex requires three multiplications and one inverse and therefore the total computational complexity is 3(NG+3) muls + (NG+3) invs for (NG+3)

vertices. The viewport transform requires three multiplications and three additions for each vertex. The computational complexity is 3(NG+3) muls + 3(NG+3) adds for (NG+3) vertices.

Table 2.1: Complexity analysis of the eye space subdivision Operations Computational Complexity Subdivision for 6 components :

Eye coordinate: (xE, yE, zE) Normal: (xN, yN, zN)

6(NG+2) adds Projection transform for NG+3

vertices

16(NG+3) muls + 12(NG+3) adds

Perspective division for NG+3

vertices 3(NG+3) muls + (NG+3) invs Viewport transform for NG +3

vertices 3(NG+3) muls + 3(NG+3) adds Total

(22NG+66) muls (21NG+57) adds

(NG+3) invs

Compared to the eye space subdivision, the dual space subdivision subdivides triangles after the viewport transform. Thus, the projective transform, perspective division and viewport transform are performed for the three vertices. The complexity is listed in Table. 2.2. After the viewport transform, the eye coordinates, normal vector and the window coordinate will be subdivided. To have perspective correct eye coordinates and normal vectors for the intermediate vertices, a setup for perspective correction is performed by dividing the eye coordinates and normal vectors by the wclip term. The computational complexity of the setup is six multiplications for each vertex. After setup, the subdivision is performed for the coordinates in two spaces and the normal vector and the computational complexity is 10(NG+2) additions. The final step is perspective correction which divides the eye coordinates and normal vectors by the 1/wclip term of

intermediate vertices. Since there are NG intermediate vertices and six divisions are required for each perspective correction, the computational complexity is 6NG muls + NG invs.

Table 2.2: Complexity analysis of the perspective correct dual space subdivision Operations Computational Complexity

Projective transform for 3 vertices 3x16 muls + 3x12 adds Perspective division for 3 vertices 3x3 muls + 3 invs

Viewport transform for 3 vertices 3x3 muls + 3x3 adds Setup for perspective correctly

subdivision 3x6 muls

Subdivision for 10 components:

Eye coordinate :( , , )

Screen coordinate: 1 )

, coordinates and normal vectors for light intensity calculation. The computation can be further reduced while the perspective incorrectly subdivision is used and the setup and correction can be skipped. This perspective incorrectness of the intensity on the rendered object can be neglected because human eye is not sensitive to the light intensity of small difference. The complexity of the proposed perspective incorrectly dual space subdivision scheme is listed in Table. 2.3.

Table 2.3: Complexity analysis of the perspective incorrect dual space subdivision

Operations Computational

Complexity Projective transform for 3 vertices 3x16 muls + 3x12 adds

Perspective division for 3 vertices 3x3 muls + 3 invs Viewport transform for 3 vertices 3x3 muls + 3x3 adds

Subdivision for 10 components:

Eye coordinate: (xE, yE, zE) Normal: (xN, yN, zN)

Screen coordinate:

1 ) , , , (

clip w w

w y z w

x

10(NG+2) adds

Total

66 muls (10NG +65) adds

3 invs

2.4 Triangle Filtering Scheme

To reduce the computation for primitive-level operations, the filtering scheme as shown in Fig. 2.10 is added to the proposed algorithm. The filtering scheme is a hybrid scheme that combines culling/clipping before subdivision and highlight test.

The backface culling in the graphics pipeline is used to test whether a triangle is a backface to the eye direction by the sign of the inner product of the face normal vector and eye direction vector. If a triangle is a backface, it will be discarded and not rendered.

In the subdivision algorithm, a triangle will be subdivided into small ones. Performing culling test for these triangles individually brings significant overhead to the computation and power consumption. Because the generated triangles and the original triangle are on the same plane, the face normal vectors are parallel to each other.

Therefore, the inner products of these face normal vectors and the eye direction vector will be the same. The statement implies that there is no need to perform backface culling test for each generated triangle since the results will be the same. Hence, in the

proposed algorithm, the subdivision is performed after culling test. If the original triangle is culled, the subdivision is unnecessary. Otherwise, all generated triangles are rendered without culling test. Clipping is another primitive level operation in the pipeline. Since the subdivision is performed after clipping, the generated triangles of the clipped original triangle are guaranteed to be inside the view frustum. Therefore, it is not necessary to re-clip these triangles.

To reduce the redundant subdivision, the subdivision-based algorithm usually includes the highlight test scheme. In the proposed algorithm, the mixed-shading [9][10]

for the highlight test is adopted. The scheme tests the HV

 term of the original three vertices. While one of the HV

 term is greater than the threshold value, the triangle will be subdivided. If all HV

 terms are smaller than the threshold value, we bypass the subdivision and render the triangle with Grouaud shading. Thus, the redundant primitive operations can be reduced.

Htest Clipping

& Culling

Gouraud shading

Subdivider Pass

Rasterizer (a) H test passed region

(b) Triangle filtering data flow

No pass

V H 

Fig. 2.10. Data flow of the triangle filtering scheme.

2.5 Triangle Setup Variable Sharing Scheme

To reduce the triangle setup and the unnecessary subdivision for vertex attributes, a

triangle setup variable sharing scheme is exposed in this section. The concept of the setup reusing result has been shown in [15]; however, the detailed description is not given. During rasterization, the vertex attributes are linearly interpolated for each pixel.

These attributes include screen coordinates, texture coordinates, depth values, fog factors, light intensities and etc. The interpolation usually makes the use of the plane equation [17]. An example is given in Fig. 2.11, where (xi, yi) is the window coordinates of the triangle and ui is the attribute to be interpolated. The attribute plane defined by ui

is obtained by solving Eq. (2.17).

 coordinates [xi, yi, 1] of the triangle. Therefore, once the inverse matrix is available, it

can be used to compute any coefficient for interpolating the attributes of the same triangle. Thus, the cost for setup one attribute is generally a 3x3 matrix multiplication.

The generated triangles of the subdivision algorithm increase the complexity for triangle setup. Because the generated triangles are on the same plane, they define the same attribute plane for each attribute. The coefficients of the attribute planes can be shared by the generated triangles without re-computing these coefficients. As illustrated in Fig. 2.11, the triangle is subdivided into four small triangles and therefore the original setup cost for one vertex attribute of these triangles are four 3x3 matrix inversions and four 3x3 matrix multiplications. With the setup variables sharing scheme, the setup only requires one 3x3 matrix inversion and one 3x3 matrix multiplication because the pre-computed variables are shared by the small triangles. Reusing these coefficients eliminates the subdividing and the setting up vertex attributes for the small triangles.

Most rasterization algorithms start rasterization from a pixel with initial attribute values and evaluate the attribute values of next pixel in an incremental manner. It is necessary to compute the initial attribute values for each generated triangles in Eq. (2.20). It takes three multiplications to re-setup for each generated triangle in tile-based traversal scheme [16].

(x0, y0, u0)

(x1, y1, u1)

(x2, y2, u2)

Initial Point 1

Initial Point 2 Initial Point 3

x

y

Fig. 2.11. Illustration of the triangle setup variable sharing.

Chapter 3

Proposed Geometry Engine Architecture

In this chapter, a power efficient geometry engine (GE) architecture for 3D graphics pipeline architecture is proposed. Several kernel blocks including the primitive input control (PIC), the primitive processing unit (PPU), vertex processing unit (VPU) and vertex cache management unit (VCMU) are proposed to optimize the power consumption and to support the scalable quality mechanism via the proposed subdivision algorithms. The proposed GE supports the scalable quality mechanism via the proposed subdivision algorithm. The users can choose the most efficient configuration for the graphics processing according to the requirements of the shading quality and the power budget. The supported scalable quality levels are level-0, level-1 and level-2. The overall architecture of the proposed GE is depicted in Fig. 3.1 and the detailed descriptions of each block are given in the following subsections.

Post-TnL Vertex Cache

Dispatch Queue1

128b 128b

128b To Setup Engine

From Pre-TnL Cache Host IF

Index FIFO

128b

Primitive Input Control (PIC)

Vertex Cache Manage Unit

(VCMU)

128b

Subdivision Control (SC)

Parameter Registers

Primitive Queue

PPU VPU

Output Control

Dispatch Queue2

Fig. 3.1. Overall architecture of proposed GE architecture.

3.1 Primitive Input Control (PIC)

The primitive input control (PIC) processes the input primitive information from

The primitive input control (PIC) processes the input primitive information from

相關文件