Thesis Organization - 應用於三維繪圖系統之可重組式深度緩衝區壓縮演算法設計與實作

Chapter 1 Introduction

1.2 Thesis Organization

The rest of the thesis is organized as follows. A brief review of 3D graphics rendering pipeline and existing depth buffer compression algorithms are described in Chapter 2. In Chapter 3, the proposed reconfigurable algorithm and architecture have been presented. The simulation results and chip implementation are addressed in Chapter 4. Last, brief statements conclude the presentation of this thesis.

Chapter 2 3D Graphics Rendering Pipeline and Depth Buffer Compression Schemes

In this chapter, we will introduce the fundamental 3D graphics pipeline and existing depth buffer compression algorithms

2.1 3D Graphics Rendering Pipeline

In this section, a brief overview of 3D graphics rendering pipeline is introduced.

Fig. 2.1 shows the conventional rendering pipeline. Polygon-based rendering is one of the mainstream methods to generate 3D graphics [19] . Generally, the pipeline can be divided into two subsystems including the geometry subsystem and raster subsystem.

Fig. 2.1. 3D graphics rendering pipeline.

2.1.1 Geometry Subsystem

Generally, the geometry subsystem can be subdivided into five stages, including triangle decompression, viewing transform, culling and clipping, perspective transform, and lighting. Before viewing transform, the geometry subsystem receives a triangle mesh composed of triangles. After triangle decompression, the geometry subsystem transforms 3D objects from the world space to the viewing space, that is called viewing transformation. Because some triangles are of invisible triangles, such as back-faced triangles and too small triangles, or outside of the viewing volume, they can be culled and clipped as soon as possible for reducing the unnecessary data computation. This stage is referred to as culling and clipping. Perspective transformation will transform 3D objects from the viewing space to the projection space, i.e. 3D coordinate of an object will be mapped into 2D coordinate. The lighting operation in the geometry subsystem performs lighting equations on vertices. Generally, these equations are complex for simulating lighting effect in the real world.

2.1.2 Raster Subsystem

The raster subsystem renders the transformed polygons generated from the geometry subsystem to the monitor pixel-by-pixel. The first stage of the raster subsystem is triangle setup for preparing data about triangular shape, bounding rectangle, texture coordinates, color, and etc. After triangle setup, the raster subsystem will interpolate all attributes of each pixel inside a triangle. These operations are called scan conversion. Visibility comparison in the raster subsystem is used to detect whether a pixel is covered by other pixels. If the pixel A is covered by the pixel B, the pixel A will be dropped immediately for eliminating unnecessary operations. The Z-buffer saves the depth values corresponding to the pixels not covered by other pixels at that time.

The final stage of the raster subsystem is shading and texturing. At this stage, besides the flat shading and Gouraud shading, some advanced and complex lighting algorithms, such as Phong shading, may be applied to every pixel. Lighting is used to calculate the colors of vertices or pixel related to the lighting source in the 3D space. Therefore, the lighting model is used to describe the relationship between vertices or pixels and the lighting source. One of the popular lighting models is Phong model expressed in the following equation.

H ns N ksIs L

N kdId kaIa

I   (  ) (  ) (2.1)

where I_a denotes the intensity of the ambient light, I_s, L, N denote the intensity of the light source, the unit vector from pixel to the light source, and the normal vector of the pixel, respectively. H equals (L+V)/2, V denotes the vector from the pixel to the view, n_s describes the gloss to model the highlight, and ka,kd, and ks are the coefficients to model the characteristic of the material.

Texture mapping needs to access texture buffer with huge memory bandwidth and apply mapping equations. However, texture mapping provides an effective way to mimic the realistic of real world on the display. Some algorithms and architectures are presented about the texture filter [12] and texture compression [13] [14] . The output of the raster subsystem will be transferred to the frame buffer for displaying on the monitor.

2.1.3 Depth Buffer

In this section, the depth buffer is introduced. The depth buffer, i.e. Z buffer, saves the depth values corresponding to the pixels at that time. In order to determine whether a pixel is covered by other pixels, the depth test will be performed. The depth test reads and writes the depth buffer many times whenever a pixel has to be tested.

Thus, the depth test results in the heavy bandwidth traffic on memory bus. To reduce the heavy memory accesses, more efficient compression will be highly demanded. In this thesis, a 4x4 or 8x8 pixels called a tile will be access from the depth buffer.

There are some schemes depending on the depth buffer for reducing memory access, such as hierarchical Z buffer [27] , Z-max culling, Z-min culling [26] , and depth filter [28] [29] . Besides the filter-based memory-accessing reduction techniques, the offset-based data compression schemes are investigated to reduce memory bus traffic. The next section will briefly introduce this kind of techniques.

2.2 Existing Depth Buffer Compression Schemes

In this section, we give an overview of the state-of-the-art compression schemes.

Generally, these schemes can be divided into three categories, fast z-clears, differential differential pulse code modulation (DDPCM), anchor encoding, HA compression scheme, plane encoding, and depth offset compression scheme. The descriptions are as follows.

2.2.1 Fast Z-Clears

Fast z-clears [15] is a simple compression algorithm and easy to be implemented.

A dedicated bit is used to indicate whether the tile is cleared. If the tile is cleared, we can only write back the latest depth values to depth buffer without reading the depth buffer to update the depth values.

2.2.2 Differential Differential Pulse Code Modulation

The Differential Differential Pulse Code Modulation (DDPCM) [16] scheme is widely applied to the data compression since the depth values are obtained by linearly interpolation in the screen space. DeRoo et al. [16] proposed a depth buffer compression algorithm as illustrated in Fig. 2.2., where the notations are defined in the following equations describe the notations in Fig. 2.2.

△z4 = z4 – z0 (2.2a)

△z₈ = z₈ – z₄ (2.2b)

△z12 = z12 – z8 (2.2c)

△z²₈ = △z₈ – △z₄ (2.2d)

△z²12 = △z12 –△z8 (2.2e)

△z²₂ = z₂ – z₁– △z1 (2.2f)

△z²3 = z3 – z2 – △z2 (2.2g)

△z²₅ = △z₅ – △z₄ (2.2h)

△z²6 = △z6 – △z5 (2.2i)

△z²₇ = △z₇ – △z₆ (2.2j)

The DDPCM scheme can achieve the high compression ratio (CR) on 8x8 tile size, where CR is defined as follows.

bits compressed

bits ed uncompress



CR (2.3)

DeRoo et al. also proposed an extended depth buffer compression scheme, called two-plane mode, in order to handle specific cases that tile can be separated into two planes.

(a) Original tile (b) Compute 1^st order column differentials

(d) Compute 2^nd order row differentials

z₀ z₁ z₂ z₃

∆z4 ∆z5 ∆z6 ∆z7

∆z8 ∆z9 ∆z10 ∆z11

∆z12 ∆z13 ∆z14 ∆z15

z₀ z₁ z₂ z₃

z4 z5 z6 z7

z8 z9 z10 z11

z12 z13 z14 z15

z0 z1 z2 z3

∆z4 ∆z5 ∆z6 ∆z7

∆²z8 ∆²z9 ∆²z10 ∆²z11

∆²z12 ∆²z13 ∆²z14 ∆²z15

z0 ∆z1 ∆²z2 ∆²z3

∆z4 ∆²z5 ∆²z6 ∆²z7

∆²z8 ∆²z9 ∆²z10 ∆²z11

∆²z12 ∆²z13 ∆²z14 ∆²z15

Fig. 2.2. Illustration of DDPCM scheme.

2.2.3 Anchor Encoding

Van Dyke and Margeson [17] proposed a compression scheme similar to the DDPCM scheme. Instead of setting upper left pixel as a reference point, this compression algorithm selects a fixed anchor point, z0, from other positions in a tile as shown in the Fig. 2.3. All we have to save are 16-bit anchor point, 7-bit x differential, 7-bit y differential and 5-bit 2^nd order differentials.

In fact, we cannot obtain better compression ratio by anchor encoding than that of the DDPCM scheme [21] .

p p p p

p z₀ p

p p p

p p p p

∆x

∆y

Fig. 2.3. Illustration of anchor encoding.

2.2.4 HA Compression Scheme

Hasselgren and Akenine-Möller proposed a state-of-the-art depth buffer compression scheme, which can achieve high CR by exploiting the continuity of interpolated depth values in the screen space [21] .

The plane mode means how many reference points will be used to compute differentials. In HA compression scheme, one-plane and two-plane modes are included.

In one-plane mode, only one reference point is used to achieve compression; in two-plane mode, two reference points are used. The operations of the one-plane mode are illustrated in the Fig. 2.4. The two 1^st order differentials are △z₁ and △z₄. Except z0, △z1 ,and △z4, the remaining values are called 2^nd order differentials. The example of the one-plane mode is illustrated in Fig. 2.5. From the one-plane mode example, the 2^nd order differentials are saved in only 1 bit that is the reason why this algorithm can achieve better CR than other compression schemes. In two-plane mode, two one-plane-mode operations will be applied according to two different reference points.

Fig. 2.6 depicts an example of two-plane mode. Furthermore, there are two kinds of combination cases, as shown in Fig. 2.7, including rising and falling cases in the two-plane mode, where R means the reference point. In the rising case, the slope of

break points is rising. That is why this condition called rising case. Similarly, the slope of falling case is falling. These two kinds of combination cases can increase the compression flexibility to achieve higher compression ratio.

The two-plane mode has break-point information which is composed of 0’s and 1’s.

Because we have to combine two sets of differentials with two different reference points, break points indicate which differential set should be chosen for combination. The positions of break points indicate that the value of the 2^nd order differentials is larger than that of HA compression scheme. Additionally, this scheme as shown in the Fig. 2.6 can handle two-plane mode cases rather than a fixed-position-reference-point scheme of the extended DDPCM scheme.

Fig. 2.4. Illustration of one-plane mode compression.

0 1 1 2

Fig. 2.5. Example of one-plane mode using HA compression scheme.

0 1 1 5

(d) Compute 1^st order differentials based

on lower-right pixel (f) Combine two sets of differentials

(a) Original tile (c) Compute 2^nd order differentials based

on upper-left pixel and add one to 2^nd order differentials above the red line Ref. pt.= 0

(b) Compute 1^st order differentials based on upper-left pixel

(e) Compute 2^nd order differentials based on lower-right pixel

Ref. pt.= 8 ∆z1= 0 ∆z4= -1 Ref. pt.= 0 ∆z1= 0 ∆z4= 0

Ref. pt.= 8

∆z1= 0

∆z4= -1

Fig. 2.6. Example of two-plane mode using HA compression scheme.

(a) Rising case (b) Falling case 0

Fig. 2.7. Two kinds of cases supported by HA compression scheme and corresponding break-point maps.

2.2.5 Plane Encoding

Different from the compression algorithms with the use of the continuity of interpolated depth values in the screen space, plane encoding labels triangles in a range

of tiles and saves these index numbers eventually. When a pixel is rendered, the depth value corresponding to the coordinate has to be computed. Van Hook [18] and Liang et al. [19] both presented compression schemes similar to the plane encoding. Fig. 2.8

shows the abstract concept of the plane encoding. The plane encoding can handle several overlapping triangles in a single tile, which is suitable for large tile size. The drawback is that it must store indices and the corresponding counter value in depth tile cache [21] .

Fig. 2.8. Example of plane encoding.

2.2.6 Depth Offset Compression

Morein and Natale [20] presented depth offset compression as illustrated in Fig. 2.9.

For tile-based rendering, assume that we save the Z-max (maximum depth value) value and Z-min value (minimum depth value) of a tile. The depth values of a tile will be categorized into to the representable and unrepresentable ranges. The representable ranges consist of two regions based on Z-max value and Z-min value.

Hasselgren and Akenine-Möller [21] also have presented a modified scheme consisting of two kinds of representable ranges in depth offset compression, one with 12

bits per pixel used to store the offsets, and one with 16 bits per pixel. If the minimum and maximum values are already stored in the tile table, this scheme uses 12 or 16 bits per pixel, and results in a higher CR [25] .

If we stored the Z-max and Z-min values of the compressed tile, this scheme can be applied without extra cost. It cannot work well for high CR value, but obtains excellent compression probabilities for low CR value [21] .

Z-min value Z-max value

Representable Range Unrepresentable Range Representable Range Fig. 2.9. Illustration of depth offset compression.

Chapter 3 Proposed Reconfigurable Compression Algorithm and Architecture

In this chapter, we propose a reconfigurable algorithm and architecture for depth buffer compression. According to the different scene changes, the proposed algorithm is capable of adaptively employing three compression schemes including the 2-bit DDPCM [16] , 1-bit HA [21] , and 7-bit DDPCM schemes to generate 11 compression modes. The presented 7-bit DDPCM scheme similar to the 2-bit DDPCM scheme makes use of 7 bits to save each 2^nd order differential. The data flow graph of the proposed algorithm demonstrates the difference among different mode compressions.

The corresponding reconfigurable architecture consisting of three stages will be issued at the end of this chapter.

3.1 Proposed Reconfigurable Algorithm

In this session, the proposed algorithm will be discussed in detail by data flow graph.

3.1.1 Plane Type and Combination Case

In the proposed algorithm, the plane type also referred as to the plane mode in the 1-bit HA compression scheme is also concerned. Different from the HA scheme [21] ,

in the proposed algorithm, the compression scheme selection (CSS), which will be discussed later, in the proposed algorithm is performed after two-plane differential combination for hardware-oriented design. Fig. 3.1 illustrates how to compute two sets of differentials according to two different reference points and how to combine the two planes. Furthermore, we extend original two combination cases into four combination cases, as shown in Fig. 3.2, including rising, falling, vertical and horizontal cases in the two-plane type. These four kinds of combination cases can increase the compression flexibility to achieve higher compression ratio.

z

₀

z

₁

z

₂

z

₃

(d) Combine two differentials plane based on z0 and z15, respectively (a) Original tile with depth values (b) Compute 2^nd order differentials

based on z0

Fig. 3.1. Two-plane type of the proposed reconfigurable algorithm.

(a) Rising case (b) Falling case (c) Vertical case (d) Horizontal case

Fig. 3.2. Four kinds of combination cases supported by the proposed algorithm and corresponding break-point maps.

3.1.2 Compression Schemes

So as to increase the higher CR of the depth buffer compression, we employ three kinds of compression schemes in the proposed algorithm. These schemes including 1-bit HA [21] , 2-bit DDPCM [16] , and 7-bit DDPCM schemes can be adaptively chosen with the aim of the highest compression ratio.

The difference among these three algorithms is the bit length for storing each value of differential. Through the 1-bit HA and 2-bit DDPCM schemes, we can use only one bit and two bits to store each differential, respectively. Although the 1-bit HA and 2-bit DDPCM schemes are useful to save differentials, these two compression schemes still limit CR for more complex 3D scenes. Concerning more stable CR, we decide to use 7-bit DDPCM in this thesis.

The following attributes summarize the conditions for each compression scheme.

The ranges of each compression scheme can be addressed as follows. The 1-bit HA scheme covers the differential set of {0,1} and 2-bit DDPCM scheme covers the differential set of {1,0,1}, and the differential set of the 7-bit DDPCM scheme covers the differential set of {-64,-63,,61,62,63}. Additionally, the HA scheme can be divided into two types. The type 1 HA scheme means all the 2^nd order differentials are the elements of the set {-1,0}. These 2^nd order differentials will be added by one and the 1^st order differentials will be subtracted by one such that all the differentials are the

elements of the set of {1,0}. Therefore, each differential can be saved in only one bit.

On the other hand, the type 2 HA scheme means all the differentials are already the elements of the set of {1,0} without addition and subtraction.

All compression schemes can be applied to one-plane and two-plane types. In addition, we divide a tile into two parts including vertical and horizontal parts. The horizontal part stands for the positions, z₂, z₃, z₅, z₆, z₇, z₉, z₁₀, z₁₁, z₁₃, z₁₄, and z₁₅, in Fig. 2.4. The vertical part stands for the positions, z8 and z12, in Fig. 2.4. In these two parts, different compression schemes can be applied. For example, the vertical part applies the 1-bit HA scheme and the horizontal part applies the 2-bit DDPCM scheme.

According to the combination of plane type and schemes used, the 11 compression modes can be obtained in Table 3.1. Consequently, owing to two-plane types by five schemes (i.e., ten modes are generated) and one uncompression mode, the number of modes is 11.

Table 3.1. Proposed compression modes.

Compression Mode Name

Mode Description

OP-HA-HA 1-bit HA scheme applied in both of the vertical and horizontal parts under one-plane type

OP-2bDDPCM-HA 2-bit DDPCM and 1-bit HA schemes are applied in vertical and horizontal parts, respectively, under one-plane type OP-7bDDPCM-HA 7-bit DDPCM and 1-bit HA schemes are applied in the

vertical and horizontal parts, respectively, under one-plane type

OP-7bDDPCM-2bDDP CM

7-bit DDPCM and 2-bit DDPCM schemes are applied in the vertical and horizontal parts, respectively, under one-plane

type OP-7bDDPCM

-7bDDPCM

7-bit DDPCM scheme is applied in both of the vertical and horizontal parts under one-plane type

TP-HA-HA 1-bit HA scheme is applied in both of the vertical and horizontal parts under two-plane type

TP-2bDDPCM-HA 2-bit DDPCM and 1-bit HA schemes are applied in the vertical and horizontal parts, respectively, under two-plane

type

TP-7bDDPCM-HA 7-bit DDPCM and 1-bit HA schemes are applied in the vertical and horizontal parts, respectively, under two-plane

type TP-7bDDPCM-2bDDP

7-bit DDPCM and 2-bit DDPCM schemes are applied in the vertical and horizontal parts, respectively, under two-plane

type TP-7bDDPCM

-7bDDPCM

7-bit DDPCM scheme is applied in both of the vertical and horizontal parts under two-plane type

Uncompression Unsupported combination cases in two-plane type

3.1.3 Data Flow

The data flows of the proposed algorithm as depicted in Fig. 3.3 (a)-(f) are described in the following, where the coarse-solid lines in Fig. 3.3 (a)-(f) indicate the flows according to different cases. Fig. 3.3 (a) shows one-plane type; Fig. 3.3 (b) shows two-plane type including rising, vertical, and horizontal cases; Fig. 3.3 (c) shows two-plane type, including falling cases; Fig. 3.3 (d)-(f) show the data flow in uncompression mode.

In details, Fig. 3.3 (d) illustrates the two sets of break points according to the upper-left and lower-left pixels are unsupported. Fig. 3.3 (e) and (f) show the set of differentials according to the 2^nd reference point in two-plane type including rising, vertical, horizontal, and falling cases does not pass break-point-match. Fig. 3.4 shows an example of uncomoression mode for case 2. Furthermore, assume that only 2-bit DDPCM scheme is applied in Fig. 3.4. In Fig. 3.4, the break points of differentials according to the upper-left pixel are determined as a rising case. However, the break points of differentials according to the lower-right pixel are determined as an uncompressed case. Because the two sets of break points are determined as different cases, this kind of tile finally is classified into the uncompression mode.

Compute 2^nd order Break-point map check and

two-plane mode check Break-point map check and

two-plane mode check

(a) (b)

Fig. 3.3. (a) Data flow illustration of the proposed reconfigurable depth buffer compression in one-plane type. (b) Data flow illustration of the proposed reconfigurable depth buffer compression in two-plane type for rising/vertical/horizontal cases.

Compute 2^nd order Break-point map check and

two-plane mode check Break-point map check and

two-plane mode check

Fig. 3.3. (c) Data flow illustration of the proposed reconfigurable depth buffer compression in two-plane type for falling cases. (d) Data flow illustration of the proposed reconfigurable depth buffer compression in uncompression mode for case 1.

Compute 2^nd order differentials according to

upper-left pixel

Break-point generation and one-plane mode check

Break-point map check and two-plane mode check Break-point map check and

two-plane mode check

(e) (f)

Fig. 3.3. (e) Data flow illustration of the proposed reconfigurable depth buffer compression in uncompression mode for case 2. (f) Data flow illustration of the proposed reconfigurable depth buffer compression in uncompression mode for case 3.

在文檔中應用於三維繪圖系統之可重組式深度緩衝區壓縮演算法設計與實作 (頁 14-0)