PRT compress method - Related Works - 一個事先計算輻射傳輸為基礎的即時全域照度演算法

Chapter 2 Related Works

2.4 PRT compress method

The principle component analysis [10] (PCA) method is a traditional method in

statistics. PCA is also used in computer graphics in some researches such like James

et al. [8], Matusik et al. [15], Vasilescu et al. [24], and Wood et al. [27].

PCA can find the major components of the high dimensional sample data and

transform the sample data into low dimensional data set. In this thesis, we also apply

the PCA to approximate the radiance transfer matrices.

Chapter 3 System overview

Our system consists of two major processes, one is the pre-computed process and

the other is rendering process. The input of pre-computed process is a 3D model.

After the pre-computed process, the outputs are the principle component of radiance

transfer matrices. The principle component of radiance transfer matrices is also one of

the inputs of rendering process. After the rendering process, the lighting transport

results of the 3D model are displayed on the screen. In this chapter, we will discuss

these two processes individually.

3.1 Pre-computed process

Figure 3.1: Pre-computed process flow chart

A 3D model is the input of the pre-computed process. The outputs of the PRT

computation module are the PRT vector of each vertex. The PRT vector of each

vertex is given by projecting the radiance transfer matrix to the SH basis domain. The

PRT vectors of all vertices form the inputs of the LBG cluster module. The LBG

cluster module classifies the inputs into several clusters. After LBG cluster module,

output compressed data is saved for rendering process. We will describe the three

modules in the following.

3.1.1 PRT computation module

The PRT computation of radiance transfer matrices is proposed by Sloan et al.

[22]. They separated the measurement process into two passes: the shadow pass and

the light transport pass. The shadow pass measures the direct shadow effect. The light

transport pass measures the lighting effects, such as reflections and transmissions.

First, in the shadow pass, we get the visibility map V for each vertex. The _p

visibility map is measured by setting view position at vertex p and render mesh into

the cubic map. Next, sample the visibility map in direction sd. sd only needs the

directions around the vertex normal above the hemi-sphere. The shadow transfer

matrix is measured by integrating all directions above the hemi-sphere. The equation

is shown below Equation 1.

“j” equals to l’(l’+1)+m’+1. y_l^m(s) means that the m-th projected SH coefficient of

order l in direction s. np is the normal of the vertex p.

Before measuring light transport pass, we need to project BRDF to SH basis

domain. Westin et al. [25] proposed a Monte Carlo technique to compute the BRDF

matrix, B. The z-axis is the vertex normal and the y-axis is the tangent vector of the

vertex. Then we project the BRDF matrix to SH basis.

In the visibility map, there are three situations: entirely shadowed, entirely

un-shadowed, and partially shadowed. When measuring light transport pass, we only

need to measure the directions which are partially shadowed. We update the transfer

matrix from (M_p)_ij⁰ iteratively, as shown below Equation 2 and 3.

(M is the final radiance transfer matrix of vertex p. “q” is the hitting point of

sample ray from vertex p to another triangle in direction s. R_q(B)denotes the rotated

BRDF matrix where the coordinate is aligned the local coordinate of the vertex. If the

object does not consider the light transmission effect, we only need to integrate over

the hemi-sphere. ”b” is a user defined value which is the maximum number of

iterations for measuring radiance transfer matrix.

3.1.2 LBG clustering module

The PRT computation module measures the radiance transfer matrix of each

vertex and projects to the SH basis. The inputs of the LBG clustering module are the

projected coefficients. The projected coefficients of each vertex are called the PRT

vector. For all vectors, we cluster the data using the LBG algorithm [13]. The LBG

clustering algorithm is also called k-means clustering algorithm [20], where “k”

means the data will be clustered into k clusters. The pseudo code of LBG algorithm is

shown in Figure 3.2.

Step 0: Initial pass:

For each cluster, randomly give the mean value.

Step 1: Cluster pass:

For each vector, find the nearest cluster by calculating distance of mean.

Step 2: Update pass:

For each cluster, update the mean by average the vectors which belong to the cluster.

Step 3: Check pass:

The total energy of all vertices Ei is defined as following Equation 4. The “i”

means i-th iteration. The distance of mean is the Gaussian distance.

∈

∑

The converged ratio is defined in Equation 5. When the ratio is less than a user

defined threshold, the LBG clustering algorithm is done. In our system, another

converged state is occurred when the total energy function equals to zero. It may

occur when all data is the same.

∞

The traditional LBG algorithm initializes means by randomly sampled values. It

may take unnecessary iterations to converge the state. So, in our system, we first scan

all vertices to find the hyper-box of all data. The idea of hype-box is similar to the

bounding box, where the difference is that the dimension of hyper-box is greater than

3. Then we randomly pick up several points in hyper-box as initial cluster means.

Furthermore, the LBG algorithm may result another problem, called “Null

Cluster”, even though the state converged. The “Null Cluster” means that no data

belongs to this cluster after clustering algorithm done. It is because that we pick up

points randomly. To solve this in our system, we will find a cluster, which includes

more data than any other clusters, when the null cluster occurred. Then we divide the

data into two clusters. The total energy is surely less than the original energy. The

“Null Cluster Avoidance” is still applied until there are none “Null Cluster”. After

LBG clustering module, the original data are partitioned into several clusters.

3.1.3 PCA compressing module

After LBG clustering module, we apply PCA to each cluster. The PCA

compressing module will analyze the data and find the principle components. The

number of principle components is given by user. In practice, we often use 8 or 12

principle components for each cluster. After PCA for each cluster, we will get the

principle components for a cluster and get the weighting coefficients vector for each

vertex in the cluster. In rendering time, we will reconstruct the PRT vectors by

weighting coefficients vector and principle components. The equation is shown as

following Equation 6.

number of the principle components the less approximating distortion. The “Mk” is

the mean of the cluster. The “wpj” is the weighting coefficients vector of vertex p. The

“Bj” is the principle components of the cluster. For the vertices in each cluster, “Mk”

and “Bj” is the same. For each vertex, it only need restore “wpj” and which cluster it

belongs to.

After this module, the required storage is decreased. The required storage of the

PRT vectors is (NumberOfVertice * VectorSize). After the PCA compressing module,

the required storage is ( NumberOfVertice * NumberOfPC + NumberOfCluster *

(NumberOfPC +1)* VectorSize ).

3.2 Rendering process

In this section, we describe the rendering process. The flow chart of rendering

process is shown as Figure 3.3.

Figure 3.3: The rendering process flow chart

There are also three components in this process. The inputs of the process are

lighting vector and principle component matrices. The lighting vector is obtain by

Before calculating the dot product of vectors and lighting vector, we need to

rotate the light vector. It is because that the lighting environment is dynamic. The

environment can be rotated arbitrarily. Since the light source came from infinite, the

movement of environment need not concern. Although we can re-project light source

to SH basis again when the lighting environment changed, but it takes expensive

computation for rendering. Ivanic et al. [6][7] proposed a method to rotate the SH

coefficients in the SH domain. The computation cost is less than re-projecting lighting

vector to SH basis. We also apply their method in our system.

After pre-computed process, we get the principle components of clusters and the

weighting vectors of all vertices. In the original PRT, we need to calculate the dot

product of the PRT vectors and lighting vector to get the lighting effect. After

compression, we only need to calculate the dot product of the principle components

and the lighting vector, and weighted summation by weighting vector. The equation is

shown as follows: Equation 7.

weighting summation in GPU using fragment shader. We use the new features of the

fragment shader model 3.0 and do per-vertex computation. The principle component

metric are placed in the hardware float constant registers. The weighting vector of

each vertex is placed as the input of the shader. We use the High Level Shading

Language (HLSL) to utilize the shader to compute the weighted summation. The

other points on the mesh surface except mesh vertices will shade by hardware

interpolation.

Chapter 4 Triangles Overdraw Reduction

In this chapter, we discuss the triangles overdraw problem and introduce two

methods to reduce it. First in Section 4.1, we introduce the triangles overdraw

problem. In Section 4.2, we apply the Geometry Coherence to decrease the triangles

overdraw. In Section 4.3, we introduce a cluster selection algorithm for super-cluster

to reduce triangles overdraw.

4.1 The triangles overdraw problem

The triangles overdraw problem means that triangles are drawn over more than

one time when rendering. This problem is occurred as a result of the current graphic

hardware restriction. Recalling in Chapter 3, we cluster the PRT vectors into several

clusters and place the principle components into the graphic card registers. If the

number of clusters is large enough, the registers requirement of all principle

components may be larger than the number of graphic hardware registers. Therefore

the objects are not rendered in one pass. Some triangles could be rendered in the

second or third passes. For example, the modern graphic hardware equips with 256

float4 registers (i.e. float4 register equals 4 float registers). If we use order 6 SH basis

for glossy objects, 8 principle components per cluster. The PRT vectors are classified

into 60 clusters. The requirement of registers is ((6*6)*(8+1)*60)/4, i.e. 4860, float4

registers is larger than 256 float4 registers.

In our system, we will split the mesh into several partitions by distribution of

clusters. When rendering, we accumulate all rendering result of partitions to get the

final result. For example in Figure 4.1, the mesh is divided into three clusters and

represented by three colors, red, blue, and green respectively.

The mesh will be split into three partitions, as shown in Figure 4.2.

Figure 4.2: Mesh Partitions

When rendering vertices of triangles in a cluster, there are two situations

occurred. One is the vertex belong to current cluster. We will calculate the lighting

effect of the vertex. The other is the vertex belong to another cluster. We will set the

output lighting effect result to zero (i.e. black color). After all clusters are rendered,

all vertices lighting effect are calculated. The triangles overdraw occurs. Figure 4.3

shows all triangles overdraw of the example mesh.

Figure 4.3: Triangles overdraw.

There are three situations of triangle overdraw. First, the triangle overdraws

equals to one, it means that the vertices of triangle belong to the same cluster. Second,

the triangle overdraws equal to two, it means that the vertices of the triangle belong to

two clusters. The triangle will be drawn in two times. Third, the triangle overdraws

equals to three, it means that the vertices of the triangle belong to three clusters, and

the triangle will draw in three times. The accumulated result is shown in Figure 4.4.

Figure 4.4: Accumulated mesh result

In our system, we apply alpha blending function in graphic card to accumulate

the result of partitions. We measure the “average triangle overdraws” in our system to

represent the triangles overdraw instead of total rendering triangles. The average

In the result, we compare the rendering speed by listing the average triangles

overdraw and frames per second (FPS). The average triangles overdraw will affect the

rendering speed directly.

4.2 The geometry coherence

In Chapter 3, we cluster the PRT vectors by calculating the Gaussian distances

between the PRT vector and mean vectors of clusters. Some vertices may cluster into

a clusters but not close in 3D space. It will increase the extra triangle overdraws. The

geometry coherence is an idea to reduce the extra triangle overdraws. The main

concept of geometry coherence is that” When the vertices are closer, the PRT vectors

are similar”. In our system, we add the geometry coherence in the LBG clustering

algorithm module. When clustering the PRT vectors, we not only calculate the

Gaussian distances of vectors but also calculate the Gaussian distances of positions.

For example in Figure 4.5, the purple point is the data point we want to cluster.

The red points are in one cluster and blue points are in another cluster.

Figure 4.5: The Geometry Coherence example

In our system, we cluster the data by calculating the Gaussian distances of PRT

vector plus the Gaussian distances of position vector. The equation is shown as

following.

j j

ij V -mean GeoCoherence

E = ^v +α⋅ (9)

The “i” means point i-th and the “j” means the j-th cluster. “α” is a user given

value to control the effect of geometry coherence. The “GeoCoherence” is defined as

finding the minimum distance of the data and the points in a cluster. The equation is

shown as following.

4.3 The clusters selection algorithm

Sloan et al. [21] introduces the idea of super-clusters to reduce the triangles

overdraw. But they don’t give a proper method for selecting the clusters for

super-clusters. In the following, we discuss a clusters selection algorithm for

super-clusters.

In the Section 4.1, we discuss the reason of triangles overdraw. Although we can

not place all clusters in one pass, but we can place partial clusters of all in one pass.

Recalling the example in Section 4.1, if we want to place all clusters into registers, we

need 4860 float4 registers. The registers requirement of one cluster is 81 float4

registers. So we can place 256/81, 3 clusters in one pass rendering. This is the main

concept of super-clusters. Our goal is to establish an algorithm to select clusters for a

super-cluster. This will reduce the triangle overdraw again.

The pseudo code of our algorithm is shown in following Figure 4.6.

Figure 4.6: The clusters selecting algorithm for super-clusters

In step 3, “merge the cluster” does not mean merge the principle component data

of two clusters. It means that we will concern both triangle overdraw reduction effect

in the next time selecting another cluster, i.e. Step 2.

For example, in Figure 4.7, every circle represents a cluster. The intersection of

two circles means the reduced triangle overdraws when they are selected into the Step 1: Find the cluster which has the most data and put in cluster C.

Step 2: Find the cluster which can reduce the most triangles overdraws of cluster C.

Step 3: Merge the cluster selected by step 2 into cluster C.

Step 4: If all clusters are selected, then exit.

Else if the registers are full, then go to step 1 and empty the cluster C.

Else go to step 2 to find next cluster.

cluster because the purple cluster will reduce more triangle overdraws. So, the

“merge” means merging the triangle overdraw reduction effects when we select

another cluster into a super-cluster in the next iteration.

Figure 4.7: An example of clusters selection

Chapter 5 Experimental Results

In this chapter, we demonstrate experimental results of our purposed approach.

Our system is implemented in C++ language and DirectX 9.0c [16]. It is working on a

Pentium IV 3.4GHz CPU and an NVIDIA GeForce 6800GT graphic card.

In the testing examples, we compare our results with two previous methods

which are the original CPCA method and the CPCA triangle overdraws reduced

method. The resolution of image size is 1280 × 948 pixels. All the testing models are

shown in the Figure 5.1. Table 5.1 shows the details of these models. The first row

refers to the number of vertices of them. The second row refers to the number of

Figure 5.1 (a): The “Dino” model

Figure 5.1 (b): The “Horse” model

Figure 5.1(c): The “Bunny” model

Dino Horse Bunny

Vertices 23984 19851 34834

Triangles 47904 39698 69451

In the following tables, the first column refers to the methods. The second

column refers to the clustering time in LBG algorithm module. The third column

refers to the average triangles overdraw. The last column refers to the frames per

second (FPS). The fifth column refers to the average squared error (SE).

First, the measurement of the “Dino” model is shown as follows. The material is

applying Phong BRDF model. Table 5.2 shows the comparison with the other

methods where the number of clusters is 64. Figure 5.2 shows the rendering result.

Figure 5.2 (a) shows the uncompressed rendered result. Figure 5.2 (b) shows the

rendered result of the original CPCA method without applying triangle overdraws

reduction. Figure 5.2 (c) shows the result with applying triangle overdraws reduction.

Figure 5.2 (d) shows the rendered result which applies our method.

Clustering

Table 5.2: Dino model, 64 clusters, α=0.05

(a): uncompressed result (b): original CPCA method

(c): CPCA triangles overdraw red. (d): our method Figure 5.2: The “Dino” model comparisons, 64 clusters

Clustering time

Average triangles overdraw

FPS Average SE Original CPCA 27 m 4 s 1.733425 22.93 0.005371 CPCA overdraw red. 27 m 27 s 1.646063 24.43 0.006344 Our method 7 m 31 s 1.404810 30.52 0.012216

Table 5.3: The “Dino” model, 256 clusters, α=0.05

(a): Uncompressed result (b): Original CPCA method

(c): CPCA triangles overdraw red. (d): Our method Figure 5.3: The “Dino” model comparisons, 256 clusters

Similarly the result for 128 and 192 clusters are shown in Tables 5.4 and 5.5

Table 5.4: The “Dino” model, 128 clusters, α=0.05

Clustering

Table 5.5: The “Dino” model, 192 clusters, α=0.05

Next example is a “Horse” model. The material is applying the Phong BRDF

model. Table 5.6 shows the comparing results, where the number of clusters is 64.

Figure 5.4 displays the rendering results.

Clustering time

Average triangles overdraw

FPS Average SE Original CPCA 3 m 46 s 1.510882 27.70 0.029795 CPCA overdraw red. 4 m 19 s 1.449997 29.38 0.029944 Our method 5 m 44 s 1.417578 30.51 0.029789 CPCA overdraw red.

+ Our method 4 m 30 s 1.344148 32.11 0.030263 Table 5.6: The “Horse” model, 64 clusters, α=0.05

(a): Uncompressed result (b): Original CPCA method

(c): CPCA triangles overdraw red. (d): Our method Figure 5.4: The “Horse” model comparisons, 64 clusters

Then we increase the number of clusters to 256. The comparing results are

Clustering time

Average triangles overdraw

FPS Average SE Original CPCA 16 m 1.825407 25.18 0.006340 CPCA overdraw red. 16 m 56 s 1.724545 27.70 0.007333 Our method 20 m 17 s 1.632803 30.03 0.006301 CPCA overdraw red.

+ Our method 13 m 40 s 1.574815 32.54 0.007226 Table 5.7: The “Horse” model, 256 clusters, α=0.05

(a): Uncompressed result (b): Original CPCA method

Clustering Table 5.8: The “Horse” model, 128 clusters, α=0.05

Clustering Table 5.9: The “Horse” model, 192 clusters, α=0.05

Next, we apply the Cook-Torrance BRDF model [2]. Table 5.10 shows the result

where the number of clusters is 64. Figure 5.6 shows the rendering results.

Clustering Table 5.10: The “Horse” model, 64 clusters, α=0.05

(a): Uncompressed result (b): Original CPCA method

(c): CPCA triangles overdraw red. (d): Our method Figure 5.6: The “Horse” model comparisons, 64 clusters

Next for 256 clusters, the results are shown in Table 5.11 and Figure 5.7.

Clustering time

Average

triangles FPS Average SE

(a): Uncompressed result (b): Original CPCA method

(c): CPCA triangles overdraw red. (d): Our method Figure 5.7: The “Horse” model comparisons, 256 clusters

For 128 and 192 clusters, Tables 5.12 and 5.13 show the results.

Clustering time

Average triangles overdraw

FPS Average SE Original CPCA 7 m 14 s 1.643282 26.64 0.136315 CPCA overdraw red. 8 m 9 s 1.608796 27.45 0.137434 Our method 10 m 21 s 1.519724 30.04 0.136325 CPCA overdraw red.

+ Our method 9 m 22 s 1.457983 31.62 0.136876 Table 5.12: The “Horse” model, 128 clusters, α=0.05

Clustering Table 5.13: The “Horse” model, 192 clusters, α=0.05

Finally, we use the “Bunny” model for testing. We apply the Cook-Torrance

BRDF model. Table 5.14 shows the results for 256 clusters. Figure 5.8 shows the

rendering results. Table 5.14: The “Bunny” model, 256 clusters, α=0.05

(a): Uncompressed result (b): Original CPCA method

(c): CPCA triangles overdraw red. (d): Our method Figure 5.8: The “Bunny” model comparisons, 256 clusters

Tables 5.15, 5.16, and 5.17 show the results of 64 clusters, 128 clusters, and 196

clusters.

Clustering time

Average triangles overdraw

FPS Average SE

在文檔中一個事先計算輻射傳輸為基礎的即時全域照度演算法 (頁 21-0)