Chapter 2 Related Works
2.4 PRT compress method
The principle component analysis [10] (PCA) method is a traditional method in
statistics. PCA is also used in computer graphics in some researches such like James
et al. [8], Matusik et al. [15], Vasilescu et al. [24], and Wood et al. [27].
PCA can find the major components of the high dimensional sample data and
transform the sample data into low dimensional data set. In this thesis, we also apply
the PCA to approximate the radiance transfer matrices.
Chapter 3
System overview
Our system consists of two major processes, one is the pre-computed process and
the other is rendering process. The input of pre-computed process is a 3D model.
After the pre-computed process, the outputs are the principle component of radiance
transfer matrices. The principle component of radiance transfer matrices is also one of
the inputs of rendering process. After the rendering process, the lighting transport
results of the 3D model are displayed on the screen. In this chapter, we will discuss
these two processes individually.
3.1 Pre-computed process
Figure 3.1: Pre-computed process flow chart
A 3D model is the input of the pre-computed process. The outputs of the PRT
computation module are the PRT vector of each vertex. The PRT vector of each
vertex is given by projecting the radiance transfer matrix to the SH basis domain. The
PRT vectors of all vertices form the inputs of the LBG cluster module. The LBG
cluster module classifies the inputs into several clusters. After LBG cluster module,
output compressed data is saved for rendering process. We will describe the three
modules in the following.
3.1.1 PRT computation module
The PRT computation of radiance transfer matrices is proposed by Sloan et al.
[22]. They separated the measurement process into two passes: the shadow pass and
the light transport pass. The shadow pass measures the direct shadow effect. The light
transport pass measures the lighting effects, such as reflections and transmissions.
First, in the shadow pass, we get the visibility map V for each vertex. The p
visibility map is measured by setting view position at vertex p and render mesh into
the cubic map. Next, sample the visibility map in direction sd. sd only needs the
directions around the vertex normal above the hemi-sphere. The shadow transfer
matrix is measured by integrating all directions above the hemi-sphere. The equation
is shown below Equation 1.
“j” equals to l’(l’+1)+m’+1. ylm(s) means that the m-th projected SH coefficient of
order l in direction s. np is the normal of the vertex p.
Before measuring light transport pass, we need to project BRDF to SH basis
domain. Westin et al. [25] proposed a Monte Carlo technique to compute the BRDF
matrix, B. The z-axis is the vertex normal and the y-axis is the tangent vector of the
vertex. Then we project the BRDF matrix to SH basis.
In the visibility map, there are three situations: entirely shadowed, entirely
un-shadowed, and partially shadowed. When measuring light transport pass, we only
need to measure the directions which are partially shadowed. We update the transfer
matrix from (Mp)ij0 iteratively, as shown below Equation 2 and 3.
(M is the final radiance transfer matrix of vertex p. “q” is the hitting point of
sample ray from vertex p to another triangle in direction s. Rq(B)denotes the rotated
BRDF matrix where the coordinate is aligned the local coordinate of the vertex. If the
object does not consider the light transmission effect, we only need to integrate over
the hemi-sphere. ”b” is a user defined value which is the maximum number of
iterations for measuring radiance transfer matrix.
3.1.2 LBG clustering module
The PRT computation module measures the radiance transfer matrix of each
vertex and projects to the SH basis. The inputs of the LBG clustering module are the
projected coefficients. The projected coefficients of each vertex are called the PRT
vector. For all vectors, we cluster the data using the LBG algorithm [13]. The LBG
clustering algorithm is also called k-means clustering algorithm [20], where “k”
means the data will be clustered into k clusters. The pseudo code of LBG algorithm is
shown in Figure 3.2.
Step 0: Initial pass:
For each cluster, randomly give the mean value.
Step 1: Cluster pass:
For each vector, find the nearest cluster by calculating distance of mean.
Step 2: Update pass:
For each cluster, update the mean by average the vectors which belong to the cluster.
Step 3: Check pass:
The total energy of all vertices Ei is defined as following Equation 4. The “i”
means i-th iteration. The distance of mean is the Gaussian distance.
∈
∑
The converged ratio is defined in Equation 5. When the ratio is less than a user
defined threshold, the LBG clustering algorithm is done. In our system, another
converged state is occurred when the total energy function equals to zero. It may
occur when all data is the same.
∞
The traditional LBG algorithm initializes means by randomly sampled values. It
may take unnecessary iterations to converge the state. So, in our system, we first scan
all vertices to find the hyper-box of all data. The idea of hype-box is similar to the
bounding box, where the difference is that the dimension of hyper-box is greater than
3. Then we randomly pick up several points in hyper-box as initial cluster means.
Furthermore, the LBG algorithm may result another problem, called “Null
Cluster”, even though the state converged. The “Null Cluster” means that no data
belongs to this cluster after clustering algorithm done. It is because that we pick up
points randomly. To solve this in our system, we will find a cluster, which includes
more data than any other clusters, when the null cluster occurred. Then we divide the
data into two clusters. The total energy is surely less than the original energy. The
“Null Cluster Avoidance” is still applied until there are none “Null Cluster”. After
LBG clustering module, the original data are partitioned into several clusters.
3.1.3 PCA compressing module
After LBG clustering module, we apply PCA to each cluster. The PCA
compressing module will analyze the data and find the principle components. The
number of principle components is given by user. In practice, we often use 8 or 12
principle components for each cluster. After PCA for each cluster, we will get the
principle components for a cluster and get the weighting coefficients vector for each
vertex in the cluster. In rendering time, we will reconstruct the PRT vectors by
weighting coefficients vector and principle components. The equation is shown as
following Equation 6.
number of the principle components the less approximating distortion. The “Mk” is
the mean of the cluster. The “wpj” is the weighting coefficients vector of vertex p. The
“Bj” is the principle components of the cluster. For the vertices in each cluster, “Mk”
and “Bj” is the same. For each vertex, it only need restore “wpj” and which cluster it
belongs to.
After this module, the required storage is decreased. The required storage of the
PRT vectors is (NumberOfVertice * VectorSize). After the PCA compressing module,
the required storage is ( NumberOfVertice * NumberOfPC + NumberOfCluster *
(NumberOfPC +1)* VectorSize ).
3.2 Rendering process
In this section, we describe the rendering process. The flow chart of rendering
process is shown as Figure 3.3.
Figure 3.3: The rendering process flow chart
There are also three components in this process. The inputs of the process are
lighting vector and principle component matrices. The lighting vector is obtain by
Before calculating the dot product of vectors and lighting vector, we need to
rotate the light vector. It is because that the lighting environment is dynamic. The
environment can be rotated arbitrarily. Since the light source came from infinite, the
movement of environment need not concern. Although we can re-project light source
to SH basis again when the lighting environment changed, but it takes expensive
computation for rendering. Ivanic et al. [6][7] proposed a method to rotate the SH
coefficients in the SH domain. The computation cost is less than re-projecting lighting
vector to SH basis. We also apply their method in our system.
After pre-computed process, we get the principle components of clusters and the
weighting vectors of all vertices. In the original PRT, we need to calculate the dot
product of the PRT vectors and lighting vector to get the lighting effect. After
compression, we only need to calculate the dot product of the principle components
and the lighting vector, and weighted summation by weighting vector. The equation is
shown as follows: Equation 7.
weighting summation in GPU using fragment shader. We use the new features of the
fragment shader model 3.0 and do per-vertex computation. The principle component
metric are placed in the hardware float constant registers. The weighting vector of
each vertex is placed as the input of the shader. We use the High Level Shading
Language (HLSL) to utilize the shader to compute the weighted summation. The
other points on the mesh surface except mesh vertices will shade by hardware
interpolation.
Chapter 4
Triangles Overdraw Reduction
In this chapter, we discuss the triangles overdraw problem and introduce two
methods to reduce it. First in Section 4.1, we introduce the triangles overdraw
problem. In Section 4.2, we apply the Geometry Coherence to decrease the triangles
overdraw. In Section 4.3, we introduce a cluster selection algorithm for super-cluster
to reduce triangles overdraw.
4.1 The triangles overdraw problem
The triangles overdraw problem means that triangles are drawn over more than
one time when rendering. This problem is occurred as a result of the current graphic
hardware restriction. Recalling in Chapter 3, we cluster the PRT vectors into several
clusters and place the principle components into the graphic card registers. If the
number of clusters is large enough, the registers requirement of all principle
components may be larger than the number of graphic hardware registers. Therefore
the objects are not rendered in one pass. Some triangles could be rendered in the
second or third passes. For example, the modern graphic hardware equips with 256
float4 registers (i.e. float4 register equals 4 float registers). If we use order 6 SH basis
for glossy objects, 8 principle components per cluster. The PRT vectors are classified
into 60 clusters. The requirement of registers is ((6*6)*(8+1)*60)/4, i.e. 4860, float4
registers is larger than 256 float4 registers.
In our system, we will split the mesh into several partitions by distribution of
clusters. When rendering, we accumulate all rendering result of partitions to get the
final result. For example in Figure 4.1, the mesh is divided into three clusters and
represented by three colors, red, blue, and green respectively.
The mesh will be split into three partitions, as shown in Figure 4.2.
Figure 4.2: Mesh Partitions
When rendering vertices of triangles in a cluster, there are two situations
occurred. One is the vertex belong to current cluster. We will calculate the lighting
effect of the vertex. The other is the vertex belong to another cluster. We will set the
output lighting effect result to zero (i.e. black color). After all clusters are rendered,
all vertices lighting effect are calculated. The triangles overdraw occurs. Figure 4.3
shows all triangles overdraw of the example mesh.
Figure 4.3: Triangles overdraw.
There are three situations of triangle overdraw. First, the triangle overdraws
equals to one, it means that the vertices of triangle belong to the same cluster. Second,
the triangle overdraws equal to two, it means that the vertices of the triangle belong to
two clusters. The triangle will be drawn in two times. Third, the triangle overdraws
equals to three, it means that the vertices of the triangle belong to three clusters, and
the triangle will draw in three times. The accumulated result is shown in Figure 4.4.
Figure 4.4: Accumulated mesh result
In our system, we apply alpha blending function in graphic card to accumulate
the result of partitions. We measure the “average triangle overdraws” in our system to
represent the triangles overdraw instead of total rendering triangles. The average
In the result, we compare the rendering speed by listing the average triangles
overdraw and frames per second (FPS). The average triangles overdraw will affect the
rendering speed directly.
4.2 The geometry coherence
In Chapter 3, we cluster the PRT vectors by calculating the Gaussian distances
between the PRT vector and mean vectors of clusters. Some vertices may cluster into
a clusters but not close in 3D space. It will increase the extra triangle overdraws. The
geometry coherence is an idea to reduce the extra triangle overdraws. The main
concept of geometry coherence is that” When the vertices are closer, the PRT vectors
are similar”. In our system, we add the geometry coherence in the LBG clustering
algorithm module. When clustering the PRT vectors, we not only calculate the
Gaussian distances of vectors but also calculate the Gaussian distances of positions.
For example in Figure 4.5, the purple point is the data point we want to cluster.
The red points are in one cluster and blue points are in another cluster.
Figure 4.5: The Geometry Coherence example
In our system, we cluster the data by calculating the Gaussian distances of PRT
vector plus the Gaussian distances of position vector. The equation is shown as
following.
j j
i
ij V -mean GeoCoherence
E = v +α⋅ (9)
The “i” means point i-th and the “j” means the j-th cluster. “α” is a user given
value to control the effect of geometry coherence. The “GeoCoherence” is defined as
finding the minimum distance of the data and the points in a cluster. The equation is
shown as following.
4.3 The clusters selection algorithm
Sloan et al. [21] introduces the idea of super-clusters to reduce the triangles
overdraw. But they don’t give a proper method for selecting the clusters for
super-clusters. In the following, we discuss a clusters selection algorithm for
super-clusters.
In the Section 4.1, we discuss the reason of triangles overdraw. Although we can
not place all clusters in one pass, but we can place partial clusters of all in one pass.
Recalling the example in Section 4.1, if we want to place all clusters into registers, we
need 4860 float4 registers. The registers requirement of one cluster is 81 float4
registers. So we can place 256/81, 3 clusters in one pass rendering. This is the main
concept of super-clusters. Our goal is to establish an algorithm to select clusters for a
super-cluster. This will reduce the triangle overdraw again.
The pseudo code of our algorithm is shown in following Figure 4.6.
Figure 4.6: The clusters selecting algorithm for super-clusters
In step 3, “merge the cluster” does not mean merge the principle component data
of two clusters. It means that we will concern both triangle overdraw reduction effect
in the next time selecting another cluster, i.e. Step 2.
For example, in Figure 4.7, every circle represents a cluster. The intersection of
two circles means the reduced triangle overdraws when they are selected into the Step 1: Find the cluster which has the most data and put in cluster C.
Step 2: Find the cluster which can reduce the most triangles overdraws of cluster C.
Step 3: Merge the cluster selected by step 2 into cluster C.
Step 4: If all clusters are selected, then exit.
Else if the registers are full, then go to step 1 and empty the cluster C.
Else go to step 2 to find next cluster.
cluster because the purple cluster will reduce more triangle overdraws. So, the
“merge” means merging the triangle overdraw reduction effects when we select
another cluster into a super-cluster in the next iteration.
Figure 4.7: An example of clusters selection
Chapter 5
Experimental Results
In this chapter, we demonstrate experimental results of our purposed approach.
Our system is implemented in C++ language and DirectX 9.0c [16]. It is working on a
Pentium IV 3.4GHz CPU and an NVIDIA GeForce 6800GT graphic card.
In the testing examples, we compare our results with two previous methods
which are the original CPCA method and the CPCA triangle overdraws reduced
method. The resolution of image size is 1280 × 948 pixels. All the testing models are
shown in the Figure 5.1. Table 5.1 shows the details of these models. The first row
refers to the number of vertices of them. The second row refers to the number of
Figure 5.1 (a): The “Dino” model
Figure 5.1 (b): The “Horse” model
Figure 5.1(c): The “Bunny” model
Dino Horse Bunny
Vertices 23984 19851 34834
Triangles 47904 39698 69451
In the following tables, the first column refers to the methods. The second
column refers to the clustering time in LBG algorithm module. The third column
refers to the average triangles overdraw. The last column refers to the frames per
second (FPS). The fifth column refers to the average squared error (SE).
First, the measurement of the “Dino” model is shown as follows. The material is
applying Phong BRDF model. Table 5.2 shows the comparison with the other
methods where the number of clusters is 64. Figure 5.2 shows the rendering result.
Figure 5.2 (a) shows the uncompressed rendered result. Figure 5.2 (b) shows the
rendered result of the original CPCA method without applying triangle overdraws
reduction. Figure 5.2 (c) shows the result with applying triangle overdraws reduction.
Figure 5.2 (d) shows the rendered result which applies our method.
Clustering
Table 5.2: Dino model, 64 clusters, α=0.05
(a): uncompressed result (b): original CPCA method
(c): CPCA triangles overdraw red. (d): our method Figure 5.2: The “Dino” model comparisons, 64 clusters
Clustering time
Average triangles overdraw
FPS Average SE Original CPCA 27 m 4 s 1.733425 22.93 0.005371 CPCA overdraw red. 27 m 27 s 1.646063 24.43 0.006344 Our method 7 m 31 s 1.404810 30.52 0.012216
Table 5.3: The “Dino” model, 256 clusters, α=0.05
(a): Uncompressed result (b): Original CPCA method
(c): CPCA triangles overdraw red. (d): Our method Figure 5.3: The “Dino” model comparisons, 256 clusters
Similarly the result for 128 and 192 clusters are shown in Tables 5.4 and 5.5
Table 5.4: The “Dino” model, 128 clusters, α=0.05
Clustering
Table 5.5: The “Dino” model, 192 clusters, α=0.05
Next example is a “Horse” model. The material is applying the Phong BRDF
model. Table 5.6 shows the comparing results, where the number of clusters is 64.
Figure 5.4 displays the rendering results.
Clustering time
Average triangles overdraw
FPS Average SE Original CPCA 3 m 46 s 1.510882 27.70 0.029795 CPCA overdraw red. 4 m 19 s 1.449997 29.38 0.029944 Our method 5 m 44 s 1.417578 30.51 0.029789 CPCA overdraw red.
+ Our method 4 m 30 s 1.344148 32.11 0.030263 Table 5.6: The “Horse” model, 64 clusters, α=0.05
(a): Uncompressed result (b): Original CPCA method
(c): CPCA triangles overdraw red. (d): Our method Figure 5.4: The “Horse” model comparisons, 64 clusters
Then we increase the number of clusters to 256. The comparing results are
Clustering time
Average triangles overdraw
FPS Average SE Original CPCA 16 m 1.825407 25.18 0.006340 CPCA overdraw red. 16 m 56 s 1.724545 27.70 0.007333 Our method 20 m 17 s 1.632803 30.03 0.006301 CPCA overdraw red.
+ Our method 13 m 40 s 1.574815 32.54 0.007226 Table 5.7: The “Horse” model, 256 clusters, α=0.05
(a): Uncompressed result (b): Original CPCA method
Clustering Table 5.8: The “Horse” model, 128 clusters, α=0.05
Clustering Table 5.9: The “Horse” model, 192 clusters, α=0.05
Next, we apply the Cook-Torrance BRDF model [2]. Table 5.10 shows the result
where the number of clusters is 64. Figure 5.6 shows the rendering results.
Clustering Table 5.10: The “Horse” model, 64 clusters, α=0.05
(a): Uncompressed result (b): Original CPCA method
(c): CPCA triangles overdraw red. (d): Our method Figure 5.6: The “Horse” model comparisons, 64 clusters
Next for 256 clusters, the results are shown in Table 5.11 and Figure 5.7.
Clustering time
Average
triangles FPS Average SE
(a): Uncompressed result (b): Original CPCA method
(c): CPCA triangles overdraw red. (d): Our method Figure 5.7: The “Horse” model comparisons, 256 clusters
For 128 and 192 clusters, Tables 5.12 and 5.13 show the results.
Clustering time
Average triangles overdraw
FPS Average SE Original CPCA 7 m 14 s 1.643282 26.64 0.136315 CPCA overdraw red. 8 m 9 s 1.608796 27.45 0.137434 Our method 10 m 21 s 1.519724 30.04 0.136325 CPCA overdraw red.
+ Our method 9 m 22 s 1.457983 31.62 0.136876 Table 5.12: The “Horse” model, 128 clusters, α=0.05
Clustering Table 5.13: The “Horse” model, 192 clusters, α=0.05
Finally, we use the “Bunny” model for testing. We apply the Cook-Torrance
BRDF model. Table 5.14 shows the results for 256 clusters. Figure 5.8 shows the
rendering results. Table 5.14: The “Bunny” model, 256 clusters, α=0.05
(a): Uncompressed result (b): Original CPCA method
(c): CPCA triangles overdraw red. (d): Our method Figure 5.8: The “Bunny” model comparisons, 256 clusters
Tables 5.15, 5.16, and 5.17 show the results of 64 clusters, 128 clusters, and 196
clusters.
Clustering time
Average triangles overdraw
FPS Average SE
FPS Average SE